mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.52k stars 864 forks source link

Add happy-dom and benchmark #457

Open zirkelc opened 4 months ago

zirkelc commented 4 months ago

Still work in progress.

EDIT:

This PR adds happy-dom as a potential replacement for domino due to the issues described in #378 A new benchmark test was added in test/benchmark.js. The process.env.PARSER variable toggles between happy-dom and domino.

Current results:

┌─────────┬─────────────┬─────────┬────────────────────┬───────────┬─────────┐
│ (index) │ Task Name   │ ops/sec │ Average Time (ns)  │ Margin    │ Samples │
├─────────┼─────────────┼─────────┼────────────────────┼───────────┼─────────┤
│ 0       │ 'domino'    │ '192'   │ 5206131.350000004  │ '±12.86%' │ 20      │
│ 1       │ 'happy-dom' │ '83'    │ 11956762.500000013 │ '±10.99%' │ 10      │
│ 2       │ 'jsdom'     │ '57'    │ 17384329.000000015 │ '±3.25%'  │ 10      │
└─────────┴─────────────┴─────────┴────────────────────┴───────────┴─────────┘

Other html parsers could be considered which are faster than happy-dom for this taskare listed here:

martincizek commented 4 months ago

So I guess the best option for now is to switch to a fork of domino, correct?

zirkelc commented 4 months ago

If speed is the main concern, then yes the domino port passes all tests and is roughly the same speed as the original.

I didn't check the source code of the port, was the issue described in #378 fixed?

If everything is alright, I can cleanup the parser and make the PR ready.

brianfeister commented 4 months ago

@martincizek what are your thoughts here? I'm trying to evaluate whether there will be a fix for this (I'm working in AWS Lambda) or if I need a different library. That's not meant to be hostile in any way, it's just "hey, have to pivot to keep moving forward"

brianfeister commented 4 months ago

You could also consider patch-package to address the underlying problem in domino, not sure how long it would be before another issue comes up and you end up indirectly owning that dependency against your preference

zirkelc commented 4 months ago

I checked both the original repo and https://github.com/fgnass/domino and the fork https://github.com/angular/domino for the with() statement that causes the issue described in #378

It this this file in the original repo: https://github.com/fgnass/domino/blob/12a5f67136a0ac10e3fa1649b8787ba3b309e9a7/lib/sloppy.js#L7-L25

That file doesn't exist in the fork and there is no other usage of this statement. So switching to this should work.