mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.82k stars 880 forks source link

Add support for 'preserve' rules. #479

Open za3k opened 2 months ago

za3k commented 2 months ago

'preserve' is a little like 'keep', but it converts descendents to markdown.

An element which looks like this:

<div id="preserve-me">
    <b>some bolded text</b>
</div>

And with this rule added:

turndownService.preserve('div')

Will convert into this markdown:

<div id="preserve-me" markdown="1">
    **some bolded text**
</div>

Note that the results are not standard markdown. They require the use of the extended markdown="1" syntax.

za3k commented 2 months ago

I wrote this to get lighter output than provided by https://github.com/mixmark-io/turndown/pull/448, which converts entire elements to HTML.

martincizek commented 2 months ago

Hey @za3k , thank you for your contribution. I agree with you that this is needed. And in our project that uses turndown, we have implemented the same like below. The isBlock branch is consistent with the default keepReplacement (like in your code) and I think the solution with shallow variant of cloneNode() is just more elegant and probably more efficient.

What do you think? And can you please check that the implementation below also works for you?

/**
 * An alternative implementation of the default `keep` replacement.
 * Preserves current element, but uses the GFM-rendered subtree content.
 */

/**
 * Shallow keep replacement. Ignores the attributes atm.
 * The default outerHTML approach is chosen for block elements.
 */
function shallowKeepReplacement(content, node) {
  if (node.isBlock) {
    return `\n\n${node.outerHTML}\n\n`;
  }
  const clone = node.cloneNode(false);
  return clone.outerHTML.replace('><', `>${content}<`);
}

module.exports = shallowKeepReplacement;
za3k commented 2 months ago

That replacement code is cleaner, I agree. Maybe innerHTML would work better than outerHTML, even?

I unfortunately am no longer in place to verify this easily. I just completed my own markdown conversion, so I'm no longer working on turndown. If you're using it, I assume it works.

Feel free to fork my PR and add a test demonstrating it works. I can merge your change into my own PR, if you're okay keeping the separate fourth rule type.