mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.82k stars 880 forks source link

Feature: Option for list item indentation #484

Open zirkelc opened 1 month ago

zirkelc commented 1 month ago

The current implementation uses 3 spaces for <li> indentation. In order to change it, we have to add a custom rule. Would it be possible to add an option to configure this setting, for example as bulletListIndentation?

https://github.com/mixmark-io/turndown/blob/cc73387fb707e5fb5e1083e94078d08f38f3abc8/src/commonmark-rules.js#L61-L72

If accepted I'd be willing to submit a PR

martincizek commented 1 month ago

Hi @zirkelc, actually this was requested a few times. As this neither adds unjustified complexity, nor introduces use case specific behaviour, I think we can add it.

Actually we have implemented it on our project too - as a side effect of different handling of ordered lists that have more than single-digit items.

The constant 3 would be just replaced with a config setting. Maybe listIndentSize (similarly to the "tabsize" option in text editors or listIndentSpaces to emphasize the setting is numeric and not a string with the actual spaces.

function listIndent(content, indentLength) {
  const indent = strings.repeat(' ', indentLength);
  return content
    .replace(/^\n+/, '') // remove leading newlines
    .replace(/\n+$/, '\n') // replace trailing newlines with just a single one
    .replace(/\n/gm, `\n${indent}`); // indent
}

rules.set('listItem', {
  filter: 'li',
  replacement(content, node, options) {
    const parent = node.parentNode;
    let marker = options.bulletListMarker;
    if (parent.nodeName === 'OL') {
      const start = parent.getAttribute('start');
      const index = Array.prototype.indexOf.call(parent.children, node);
      marker = `${start ? Number(start) + index : index + 1}.`;
    }
    const space = repeat(' ', 1 + Math.max(0, 3 - marker.length));
    const prefix = `${marker}${space}`;
    const liContent = listIndent(content, prefix.length);
    const trailNl = node.nextSibling && !/\n$/.test(liContent) ? '\n' : '';
    return `${prefix}${liContent}${trailNl}`;
  },
});

For ordered lists, it works like this:

<it should="respect multi-digit ordered list items">
<turndown>
  <ol start="9">
    <li>a</li>
    <li>b</li>
  </ol>
</turndown>
<produces>
9.  a
10. b
</produces>
</it>

You can try if it does the job for you (e.g. using this as a custom rule) and I'd then implement it.

zirkelc commented 2 weeks ago

Hi @martincizek

that looks good! I think both listIndentSize and listIndentSpaces are good names.