Closed ghost closed 6 years ago
What engine did you test this on? (Browser/Node version?)
@bd82 latest Node LTS (6.11.2)
Would be interesting to check on latest Node 8.4 with a more modern V8. https://v8project.blogspot.co.il/2017/01/speeding-up-v8-regular-expressions.html
I've tried reproducing your results by modifying an existing JS Parser benchmark I've authored. But without success. Using a JSON syntax and 1,000 lines sample I see Moo being faster than Antlr.
Faster means X4 on Firefox and X2 on Chrome.
Hacked Runnable Lexer Benchmark
Commit that created the hack.
I agree that the original statement is not very accurate, and more importantly impossible to prove under all possible conditions... (JS engines / Different grammars).
Personally I would use a more general statement such as:
Moo is very/incredibility/super-duper fast, often beating competitors by multipliers or even orders of magnitudes.
And link to some online benchmark that proves the claims. But that is up for @tjvr to decide where the line between marketing and precise accuracy falls for this project.
@bd82 Thanks for looking into this!
I’ll look into adding antlr to the benchmarks Moo uses, at some point. :-)
I’ll look into adding antlr to the benchmarks Moo uses, at some point. :-)
it could be a little annoying to support because a Java application is used to generate the lexer and that jar is not available on npm (afaik).
@bd82 you must've used a bad antlr grammar. Also, the problem occurs with large input (2 MB in my case), so I assume memory trashing (gc pressure) is the main cause?
Would be interesting to check on latest Node 8.4 with a more modern V8.
Have to use LTS
@notsonotso
you must've used a bad antlr grammar. Also, the problem occurs with large input (2 MB in my case), so I assume memory trashing (gc pressure) is the main cause?
Please provide a reproducible test case (generated antlr lexer, moo lexer, and input file), or this discussion won't go anywhere useful. GitHub Gist is good for this.
@notsonotso
you must've used a bad antlr grammar. Also, the problem occurs with large input (2 MB in my case), so I assume memory trashing (gc pressure) is the main cause?
You can inspect the grammar I've used. It is originally from the Antlr's example grammar repository. https://github.com/SAP/chevrotain/blob/gh-pages/performance/jsonParsers/antlr/JSON_ANTLR.g4
You can examine it and reproduce it locally if you checkout the commit I linked above. Just checkout the commit/repo and open the performance/index.html page in a local browser.
It could be something with very large files, could be that some token patterns in a CSV are more suitable to be lexed using an antlr generated state machine instead of a RegExp or that the inverse is true in case of JSON tokens.
As @nathan said, once you have a reproducible example we can look into this more deeply.
Have to use LTS
Try NVM it works on Mac/Linux to rapidly switch node.js versions and have multiple node versions installed at the same time.
It could be something with very large files, could be that some token patterns in a CSV are more suitable to be lexed using an antlr generated state machine
This sounds like a reasonable explanation.
Why don't you try with a 2MB file on your test code?
Try NVM it works on Mac/Linux to rapidly switch node.js versions and have multiple node versions installed at the same time.
Why? I'll never be able to use anything other than LTS in production, so I don't see the point.
Why? I'll never be able to use anything other than LTS in production, so I don't see the point.
I fear that is a bit too simplified:
Next month version 8 will become LTS too and there will be two versions of LTS active at the same time until April 2018.
If performance considerations are very important for your project you can easily switch node.js versions to inspect future performance behavior to choose the best long term solution.
If your project is some sort of enterprise/corporate project that is sold/installed on machines outside your control and officially only LTS is supported than your CI process should probably be adjusted to run tests/CI on both versions of active LTS (until April 2018 when 6.0 is no longer LTS).
Why don't you try with a 2MB file on your test code? I will try that.
@bd82 i tried Node 8.4.0 with the same result :( Each line of my csv input is about 500-600 characters... i wonder if some regexp is blowing up (while looking for EOL?)
Unless you show your code, we can only speculate. As Nathan said, this is going nowhere.
Closing, since the discussion didn't lead anywhere.
Really? In my experience so far, moo is significantly slower than ANTLR's lexer, probably related to the fact that regexp isn't necessarily fast to begin with.
ANTLR 4.7's generated JS lexer is about 40% faster and use less memory than moo. The input grammar is CSV and about 2MB of data. I'll post the code and input file when I get around to it, just a heads up for now.