taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

Update benchmark #173

Closed fb55 closed 2 years ago

fb55 commented 3 years ago

Hi everyone!

Very cool project, keep up the good work! You really upped the game with producing a fast HTML parser, and motivated me to work on htmlparser2 again.

I just saw the benchmark in the README, and saw that htmlparser2 was directly mentioned (happy to see!). I had a big push to speed up htmlparser2 several months ago, and things look a bit different now; results can be seen at https://github.com/AndreasMadsen/htmlparser-benchmark/blob/master/stats.txt

Not sure if you want to include the benchmark in the first place, linking to the current results might be the better approach.

Let me know what you think!

fb55 commented 3 years ago

I guess an argument can be made that HTML parsers that produce DOMs should be compared — that's fair. Still, faster than htmlparser2 is strictly not true anymore for all usage patterns 😄

fb55 commented 3 years ago

Just went through the node-html-parser source code a bit and was surprised to see that entities aren't decoded while parsing. Super smart approach to decode them as needed, instead of all at once — especially when you have more of a batteries included package.

At the same time — yes, this is showing the limitations of any basic benchmark.

nonara commented 3 years ago

Nice work! Good to meet you. I think this is more of @taoqf's call, so I'll leave it to him. My two cents is it would probably be best to drop the stats and simply link to the latest stats.

I actually have a plan (if I ever find the time) to create custom compiler for a new lexer using zero-cost-abstractions. The idea being getting the best perf out of the JIT. But the key word there is if!

If I do, I'll roll it into this library and we can keep the race going 😅

taoqf commented 2 years ago

Nice work!