mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.83k stars 880 forks source link

use srcset in <img> elements #305

Open mfulton26 opened 4 years ago

mfulton26 commented 4 years ago

some sites set an invalid src attribute on <img> elements (sadly) and rely on the browser's support of srcset

it would be wonderful if Turndown could use the smallest or largest image specified in the srcset instead of the value in src

example

input

<h3>
  <img
    alt="icon, guidelines for adaptation"
    class="icon"
    src="123"
    srcset="
      https://assets.example.com/a.png  60w,
      https://assets.example.com/b.png 101w
    "
  />Class Options
</h3>

output

actual

![icon, guidelines for adaptation](123)Class Options

expected/desired

![icon, guidelines for adaptation](https://assets.example.com/b.png)Class Options
supperrain commented 2 years ago

My img is

<img data-src="https://mmbiz.qpic.cn/mmbiz_png/xxxxxxxxxxx/640?wx_fmt=png" 
data-type="png" 
data-w="2560" 
style="display: block;margin: 0 auto;max-width: 100%;">

So my code is:

turndownService.addRule('img', {
            filter: "img",
            replacement: function (content, node, options) {
                return "![](" + node.getAttribute('data-src') + ")";
            }
        })
tjbp commented 1 year ago

Small advancement on the above to get the largest src from a srcset (I needed this for transferring assets from one CMS to another):

turndownService.addRule("img", {
  filter: "img",
  replacement: function (_content, node, _options) {
    const srcset = node.getAttribute("srcset") || "";

    const srcs = Object.fromEntries(
      srcset.split(",").map((src) => src.trim().split(" ").reverse())
    );

    const largest =
      srcs[`${Math.max(...Object.keys(srcs).map((src) => parseInt(src)))}w`];

    return "![](" + largest + ")";
  },
});