Closed ehsanghorbani190 closed 7 months ago
The purpose of https://github.com/taoqf/node-html-parser is to be "fastest parser at any cost" (including resource consumption and correctness, btw). The purpose of this project - is to be no-deps pure-lua parser implementation.
There are another projects exists, that are not limited by that restrictions and works much faster (but has another consequences). For example, there is a library called Gumbo. It is worth to mention, that Gumbo also don't focus on speed (but focus on correctness IIRC), but as it is compiled-from-C library, and not a pure-lua, it in no doubt faster than this project.
So, TL;DR: different use-cases have different requirements and there are different tools to meet that requirements.
Hi! We want to write a script which will crawl at least 1000 HTML pages, run some selects on their nodes and check some conditions. I wanted to write this script in Lua, but it seems this package for HTML parsing is much, much slower than node-html-parser for JS!
JS example:
results using node v21.6.1:
Lua example:
results using Lua v5.1.5:
It about 16 times slower! Am I doing something wrong? Is my example ok?
UPDATE: I tested it with LuaJIT v2.1.1706185428, and results are better than Lua but not better than JS: