mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.32k stars 852 forks source link

Broken Image conversion on compressed <figure> tag #466

Open yagudaev opened 3 weeks ago

yagudaev commented 3 weeks ago

First off, thank you so much for making this excellent library. It has been pretty much flawless 💜.

Found a bug when dealing with images from from Substack specifically.

The HTML is compressed and uses the <figure> tag.

It works fine if the HTML has white spacing, but as soon as that whitespace is removed it fails.

Codesandbox Example

CleanShot 2024-06-18 at 15 26 06@2x

It adds new lines, breaking the image markdown formatting.

yagudaev commented 3 weeks ago

Quick workaround for now:

  let markdown = turndownService.turndown(html)

  if (html.match(/<figure.*?<\/figure>/gs)) {
    markdown = markdown
      .replace(/\[\s*\n*\s*!/gs, '[!')
      .replace(/\)\s*\n*\s*]/gs, ')]')
  }