Closed Codeneos closed 6 months ago
Great, thanks @Codeneos! I'm a bit ambivalent to bring back CJS builds, but maybe it's better that way to ease usage with plain Node.js.
The implementation of these classes was on my mental todo list anyway (as is the entire optimization of the runtime). Just today I implemented a HashMap
with MurmurHash as hasher for my ST4 port. I guess that could later be merge with this code here.
For me the problem with ESM only is that it doesn't work for everything yet. For example VSCode extensions need to be written in CJS format as ESM is not yet supported by VSCode. There are some workarounds converting ESM back to CJS but it isn't ideal. So if you would be okay with supporting CJS for the time being it would help me a lot!
I noticed optimizations are very specific, the HashMap and Set changes I did work great for my grammar and source code but when I run the benchmark it has no effect at all.
I wonder why Git blocks merging, saying that all commits need to be signed. They are all signed....
For me the problem with ESM only is that it doesn't work for everything yet. For example VSCode extensions need to be written in CJS format as ESM is not yet supported by VSCode.
When you bundle your extension (as you should) then you can generate a CJS bundle from any type of module (I do that with my extension already). But running with test frameworks needs a separate transformer, so a CJS bundle here simplifies things there.
For me the problem with ESM only is that it doesn't work for everything yet. For example VSCode extensions need to be written in CJS format as ESM is not yet supported by VSCode.
When you bundle your extension (as you should) then you can generate a CJS bundle from any type of module (I do that with my extension already). But running with test frameworks needs a separate transformer, so a CJS bundle here simplifies things there.
I am using webpack and esbuild but was getting some ERR about the ESM module not found when I ran quick test earlier today to see if I could move to ESM. Could be I missed something though somewhere.
@Codeneos Unfortunately I had to revert your changes in HashSet and HashMap, because they broke the Java runtime tests. To my shame I have to admit that I did not run these tests for a while. Otherwise I would have seen that problem earlier. Only because I'm currently converting the Java runtime tests to TypeScript I saw this problem. If you want you can open a new PR with updated classes that also work well in the Java runtime tests.
Thanks for letting me know! I'll recreate the PR and fix the issue. Do you already have some pointers as to what is causing the Java tests to fail?
Oh and, for my knowledge how do I run the Java runtime tests? Are these part of the repo or do need something specific for that?
Running the Java runtime tests is a pain. You need to build ANTLR4 locally and then symlink the JavaScript runtime folder in the ANTLR4 repo folder to the repo folder of antlr4ng.
The good news is however, that I managed to complete the TS port of the ANTLR4 runtime tests today. Clone this repo locally, run npm i
in its root and then you should be able to run the tests using npm run test
. Use npm link
to replace the antlr4ng node package with a link that points to your local copy of antlr4ng and you should be able to run the tests with the current TS runtime.
It's all a bit rough yet and if you want to run a specific test you have to manually change the includedGroups
and includedTests
in RuntimeTests.runtimeTests()
, but first run the tests as is and setup what I explained above. You can then replace HashSet and HashMap again with your version and run the tests again. You should then see which fail and use their groups and names to limit test runs to them.
Great that the tests already ported everything to TS!
I'll clone the repo and run the tests from there to pin point the issue.
For your information I just reached a big milestone: All tests now use Jest and run in parallel, reducing total execution time to under 20s on my box. You can even debug individual tests, which should help you to better figure out why a test fails.
Both the
antlr4ng
andantlr4
implement a customHashMap
andSet
that is compatible with their Java counterparts. Feature wise the implementations are correct but with certain grammars and specific source files that have deeply nested closures theSet
class is stressed unproportionally.This PR optimizes the methods of the
Set
andHashMap
which yields a significantly performance improvement in parsing time. For me parsing of 2318 sources improved from ~165.365s to ~28.300s. This doesn't improve performance for all grammars.This was tested using APEX, which is a c-style programming language for Salesforce CRM. The grammar file used was derived from Java which is what APEX is loosely based on.
Before
After
Analysis
The 2 anonymous functions on line
600
(1) and602
(2) are from thelength
function of theHashSet
. Changing thelength
function to a more simpler function that doesn't usemap
orreduce
significantly improves performance. As thelength
function alone counts for 15% of the execution time.I suspect this is due to how
Object.keys
andmap
work in that both methods allocate new arrays which is a relatively costly operation.Solution
As both the
Set
andHashMap
in the antlr4 runtime only support add, contains and get operations making them collection types from which no elements can be removed thus optimizing them is relatively simple.By stored the actual data of both collections in an array we can speed up accessing both the keys and the values of the collections. Next the length property can be determined by checking the length of the values array avoiding
Object.keys
andmap
.The hash buckets instead of storing the actual value will store a reference to the index in the array. Arrays in JS are spares by design so you can add an element at index 100 without needing to set the previous 99 indices. Also you will not get an out of bounds error when accessing an index beyond the length of an array, you will simply get
undefined
. But as the Set and HashMap do not support removal of entries this doesn't matter that much :).Other changes
CharStream
, with antlr4ng I cannot implement theCharStream
anymore due to it's#private
property.