mike-lischke / antlr4ng

Next Generation TypeScript runtime for ANTLR4
Other
66 stars 11 forks source link

Improve HashSet performance for specific Grammars #29

Closed Codeneos closed 4 months ago

Codeneos commented 5 months ago

Note: this is a work in progress PR

Description

The goal of this PR is to improve performance of the standard HashSet and HashMap collections in antlr4ng. Some grammars more often hit the HashSet length method to validate if the cache is empty or not. Currently the calculation of the HashSet size is done at the moment when the property is accessed using a Array.reduce method. For bigger HashSets this can be an expensive calculation which can be avoided by tracking the size when ever new items are added into the HashSet making accessing the size n(1) operation.

To validate the correctness of the changes in this PR new unit tests are added that test if the basic operations of the HashSet produce the expected results and will also help validate future changes do not break the hashset.

Additionally ICompareable and Hash and Equals function definitions are now templatable allowing the HashSet to specify Value type and the HashMap the Key type as comparable objects. This will provide static type checks to classes that override the default equals or hash methods of the HashMap and HashSet collection types.

Tasks

mike-lischke commented 5 months ago

btw. you can mark a PR as draft, if it is not ready for review

mike-lischke commented 5 months ago

FYI, I'm considering immutable.js for HashSet and HashMap implementations in antlr4ng. However that would make most sense when we keep objects unchangeable, which requires some extra work. Additionally I want to change the current custom hash code handling back to how it is done in Java (and required by immutable.js).

Also, in my first profiling attempts I saw that a large part of the time spent is in Node.js itself because it cannot optimize access, due to too many changes in the TS objects. So making them immutable could help speeding up the runtime.

mike-lischke commented 4 months ago

As it turned out immutable.js is not the right tool, mostly because of 2 reasons:

Instead I took over the Array2DHash(Map/Set) implementations from antlr4ts to antlr4ng. They work fine and provided a slight speed boost.

Codeneos commented 4 months ago

Sounds good 👍 I also honestly didn't have much time to continue with rewriting the hashmap.

I've also been trying to get the runtime tests to run on my windows machine but that didn't go well (yet). I'll make a PR to make some small changes that prevented it from running on windows.

mike-lischke commented 4 months ago

Side note: all the changes currently go into a separate branch (called "complete") which contains the full TS runtime and some optimizations already. This branch needs an updated ANTLR4 jar which I cannot publish yet as the branch introduces breaking API changes. If you need help with this extra branch, let me know. We can work everything out in a discussion here.