Closed fb55 closed 2 years ago
I guess an argument can be made that HTML parsers that produce DOMs should be compared — that's fair. Still, faster than htmlparser2 is strictly not true anymore for all usage patterns 😄
Just went through the node-html-parser
source code a bit and was surprised to see that entities aren't decoded while parsing. Super smart approach to decode them as needed, instead of all at once — especially when you have more of a batteries included package.
At the same time — yes, this is showing the limitations of any basic benchmark.
Nice work! Good to meet you. I think this is more of @taoqf's call, so I'll leave it to him. My two cents is it would probably be best to drop the stats and simply link to the latest stats.
I actually have a plan (if I ever find the time) to create custom compiler for a new lexer using zero-cost-abstractions. The idea being getting the best perf out of the JIT. But the key word there is if
!
If I do, I'll roll it into this library and we can keep the race going 😅
Nice work!
Hi everyone!
Very cool project, keep up the good work! You really upped the game with producing a fast HTML parser, and motivated me to work on
htmlparser2
again.I just saw the benchmark in the README, and saw that
htmlparser2
was directly mentioned (happy to see!). I had a big push to speed uphtmlparser2
several months ago, and things look a bit different now; results can be seen at https://github.com/AndreasMadsen/htmlparser-benchmark/blob/master/stats.txtNot sure if you want to include the benchmark in the first place, linking to the current results might be the better approach.
Let me know what you think!