mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.66k stars 873 forks source link

Shallow keep replacement feature #374

Open martincizek opened 3 years ago

martincizek commented 3 years ago

The current keep option keeps the whole subtree, while a "shallow" keep might match better some use cases.

Shallow keep needs to be carefully designed in order to be meaningful and universal at the same time. For example, a "shallow keep" of a table that cannot be converted to MD, should shallow copy not only table, but also the nested table-related tags, but not automatically the tags of an another nested table should it be there.

A concept of "GFM contexts" should probably be introduced prior to this.

For those looking for a simple solution for just inline tags, an alternative keepReplacement can be defined as shown below. And it's always an option to provide a custom rule (see e.g. #363 for an example).

/**
 * An alternative implementation of the default `keep` replacement.
 * Preserves current element, but uses the GFM-rendered subtree content.
 */

/**
 * Shallow keep replacement for inline elements, ignoring any tag attributes.
 * For block elements, the default outerHTML approach is applied. 
 */
function shallowKeepReplacement(content, node) {
  if (node.isBlock) {
    return `\n\n${node.outerHTML}\n\n`;
  }
  const clone = node.cloneNode(false);
  return clone.outerHTML.replace('><', `>${content}<`);
}

Usage:

const options = {
  // ...
  keepReplacement: shallowKeepReplacement,
};
const turndownService = new TurndownService(options);
// Use `keep()` as usual:
turndownService.keep('kbd');

// New behavior for inline tags:
const html = '<kbd>foo <strong>bar</strong></kbd>';
console.log(turndownService.turndown(html)); // <kbd>foo **bar**</kbd>