Closed GeorgDangl closed 1 year ago
I'm able to reproduce the perf issue. Yes, Antlr4cs is faster.
(Note, Antlr4cs keeps a write lock on a file somewhere, because the template driver code fails to open the file the second time here. The Antlr4.Runtime.Standard code does not write lock the file with identical code. I got around the problem by placing the input in unique files, e.g., ./bin/Debug/net6.0/Test.exe input1.txt input2.txt input3.txt input4.txt input5.txt input6.txt input7.txt input8.txt input9.txt input10.txt input11.txt input12.txt input13.txt input14.txt input15.txt input16.txt input17.txt input18.txt input19.txt input20.txt input21.txt input22.txt input23.txt input24.txt input25.txt input26.txt input27.txt input28.txt input29.txt input30.txt input31.txt input32.txt input33.txt input34.txt input35.txt input36.txt input37.txt input38.txt input39.txt input40.txt input41.txt input42.txt input43.txt input44.txt input45.txt input46.txt input47.txt input48.txt input49.txt input50.txt input51.txt input52.txt input53.txt input54.txt input55.txt input56.txt input57.txt input58.txt input59.txt input60.txt input61.txt input62.txt input63.txt input64.txt input65.txt input66.txt input67.txt input68.txt input69.txt input70.txt input71.txt input72.txt input73.txt input74.txt input75.txt input76.txt input77.txt input78.txt input79.txt input80.txt input81.txt input82.txt input83.txt input84.txt input85.txt input86.txt input87.txt input88.txt input89.txt input90.txt input91.txt input92.txt input93.txt input94.txt input95.txt input96.txt input97.txt input98.txt input99.txt input100.txt
)
I do see is that this grammar contains a lot of ambiguity, and does a tremendous amount of fall-back with max-k the size of the file. That's a grammar problem, but maybe there's something that Antlr4cs does better on fall backs?
$ trperf input.txt | column -t
Time to parse: 00:00:00.0743970
Decision Rule Invocations Time Total-k Max-k Fallback Ambiguities Errors
0 text_ 4 0.095922 182 76 2 2 106
1 text_ 0 0 0 0 0 0 0
2 textAddition 0 0 0 0 0 0 0
3 textAddition 4 0.001372 8 2 2 2 7
4 plainText 0 0 0 0 0 0 0
5 plainText 579 0.437253 119438 483 573 501 584
6 textContent 0 0 0 0 0 0 0
7 textContent 0 0 0 0 0 0 0
Could you reraise the issue in https://github.com/antlr/antlr4/issues? It's not an Antlr4BUildTasks problem. That's sort of like blaming MSBuild or dotnet.exe or make for the issue, whose purpose is just to build an executable. It will have a wider audience. @parrt should see this.
_(More info: The Java target does not exhibit this problem. I wonder if caching of prediction context is working in Java, but not in CSharp so well.)
Oh, thank you very much for the quick reply! I'll post the issue over there😀
About the grammar having ambiguity, do you happen to have some suggestion there, or is there some documentation I can read about that?
Opened a new issue in the antlr organization.
Hey @kaby76,
I've just recently discovered your work around ANTLR and C#, and I'm impressed! Thank you very much, first of all.
So I was pretty excited to be able to move to the newer targets, and had good results in some project. However, I had a very big performance regression in one grammar, and I can't understand why.
The grammar is pretty simple. Essentially, some texts can have square brackets denoting some placeholders, which we try to parse so that we can build a data model around it.
The grammar looks like this:
Now, we're building a
Lexer
and aParser
from it and try it out it with the following test:I was hoping for a bit of a performance improvement, but it actually went from around 2.4 seconds to 29.2 seconds, so roughly an order of magnitude slower. Profiling a few runs doesn't give me a lot of information. The memory allocation is pretty small throughout the test, and there's not a lot of GCs happening either. I see this after a few thousand iterations:
So, I'm a bit at a loss here how to proceed. I've checked the grammar also with
tranalyze
, but if I understand it correctly, the output is just statistics or telling me a rule is fine (NotEmpty
):For reference, the strings we usually encounter in the wild are at most on the order of a kilobyte size or so, and usually just contain one or very few such tags.