mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.52k stars 864 forks source link

Fixing a missing defensive check for an absent window global variable that was breaking service worker environments #443

Closed rory-instil closed 10 months ago

rory-instil commented 1 year ago

Reusing the root variable defined in the file above - and used in a similar way on line 12 - we can enable service worker environments to use this library. Otherwise these environments will encounter a window is not defined error.

This will close #397.

morganhvidt commented 12 months ago

This fixes my issue 🥳 I'm looking forward to the merge!

nyacg commented 10 months ago

I think this fixed my issue too, would be great to get it merged since it's approved

makutamoto commented 8 months ago

It seems this PR has not been released yet. Could you release this?

TranquilMarmot commented 6 months ago

Any chance this could get released soon? 😁

poltak commented 5 months ago

This was merged in a while back now but there still hasn't been a release. Any idea when this might happen?

Lots of web extensions are having to move to manifest V3 in the next few months and one of the big changes there on Chrome is that your extension's background script will now run as a Service Worker. Which lacks a window global and thus turndown stops working. This PR would fix turndown for MV3 web exts

rory-instil commented 5 months ago

I've been using this pnpm patch / patch-package in lieu of a release: https://gist.github.com/rory-instil/2091eeeaf087d6caaff7c91a7e16f384

martincizek commented 5 months ago

Released v7.1.3. Please try it out and let me know if the issue is fixed.

poltak commented 5 months ago

Just verified it fixes it for me in Chrome MV3 background service worker context. Thanks @martincizek !

poltak commented 5 months ago

I spoke too soon without giving it a good enough test. After properly testing it, it errors out trying to access document global, which is not available in service workers, here: https://github.com/mixmark-io/turndown/blob/master/src/html-parser.js#L42

I see you're using domino to afford HTML parsing in non-browser environments. I'm guessing this would also work in service workers, though I'm not sure how you could easily differentiate standard browser envs from service workers in a similar way to what you're currently doing to differentiate node from browser.

It would also probably also significantly increase the SW script size. An ideal solution would be to afford the user passing in their own HTML parsing implementation, as that may allow users to avoid the script size increase if they're already using some other non-browser HTML parsing solution, like cheerio. Of course, that's more work.

martincizek commented 5 months ago

It would also probably also significantly increase the SW script size.

There are different builds for browser and non-browser. I think you can influence this with how you declare the dependencies. And the online browser-build does not contain domino indeed.

martincizek commented 5 months ago

I spoke too soon without giving it a good enough test. After properly testing it, it errors out trying to access document global, which is not available in service workers, here: https://github.com/mixmark-io/turndown/blob/master/src/html-parser.js#L42

This code should not be accessed if you provide turndown() a DOM node yourself.

poltak commented 5 months ago

Ah that's quite obvious in hindsight. Seems to work fine with a Document object produced via linkedom's worker mode. Thanks for the support @martincizek !