mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.94k stars 881 forks source link

Allow TurndownService to accept custom HTML parsers #456

Closed poltak closed 8 months ago

poltak commented 8 months ago

This solves a problem for me where I need to run turndown in a service worker. That environment doesn't support either the DOM API-based or domino-based solutions that turndown's HTMLParser implementation only supports. I talked a bit more about this in https://github.com/mixmark-io/turndown/pull/443#issuecomment-2003173563

With this change I can pass in a HTMLParser-compatible object to the TurndownService constructor, implemented using a package like linkedom which affords a DOM-like API and works fine in a service worker context.

It should not change anything for existing users in either standard (DOM accessible) browser or node contexts as the existing HTMLParser implementation should still be used by default.

martincizek commented 8 months ago

Thank you for your contribution.

domino-based solutions that turndown's HTMLParser implementation only supports

You can pass TurndownService#turndown() a DOM object(element/document/fragment node). This is the current way to use your custom HTML parser - just convert the string yourself and then pass the DOM node to turndown instead of a string.

Unfortunately introducing customizable HTMLParser doesn't solve the packaging issue, that's why I'd prefer to follow the idea described in #290. However, you can achieve the same by wrapping the turndown() method and using your own DOM parser as described above.