mozilla / readability

A standalone version of the readability lib
Other
8.34k stars 579 forks source link

H1 is converted into H2? #863

Open yagudaev opened 2 months ago

yagudaev commented 2 months ago

Hi Mozilla team, thanks so much for this amazing library!

I found it surprising to see H1 converted into an H2 like so:

CleanShot 2024-04-26 at 14 24 43@2x

Is there a way to turn this off?

Here is a live demo

yagudaev commented 2 months ago

Here is a quick workaround using classes:

    const parser = new DOMParser();
    const doc = parser.parseFromString(value, "text/html");
    const readability = new Readability(doc, {
      classesToPreserve: "h1",
    }).parse().content;
    const readabilityDoc = parser.parseFromString(readability, "text/html");
    readabilityDoc.querySelectorAll(".h1").forEach((heading) => {
      heading.outerHTML = heading.outerHTML.replace("h2", "h1");
    });

See: Demo