mozilla / readability

A standalone version of the readability lib
Other
8.66k stars 591 forks source link

Footnotes (as generated by Jekyll with kramdown) not included in reader mode #654

Open kixiQu opened 3 years ago

kixiQu commented 3 years ago

https://maya.land/monologues/2020/07/23/footnote-of-the-web.html

This page is generated by Jekyll with a lightly cosmetically modified version of the default theme. This means that footnotes rendered by Kramdown in one of the most popular static site generators aren't being picked up.

The missing chunk (truncated)

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>footnote text here. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>

is within the main <article></article>. Here's another jekyll site that has the same problem: https://jonathanbuys.com/Bigfoot_Footnotes_in_Jekyll/

I have about three and a half minutes worth of web experience so it's possible this isn't an answer, but it seems like in the same way that there are roles that are considered unlikely to be good for inclusion, it might be possible to mark doc-endnotes and doc-endnote as good for inclusion in reader mode.

Anyway, y'all are the experts, thanks for everything you do!

gijsk commented 3 years ago

Thanks for the report!

"expert" is too high praise, but I do know that the doc-endnotes and doc-endnote values are bogus - the valid list is at https://www.w3.org/TR/wai-aria-1.1/#role_definitions (or at least, that's the most authoritative one I'm aware of). I don't know why Jekyll / Kramdown / Bigfoot are generating those values, but it's probably worth an issue filed with them...

I suspect the issue here is that we mark footnote out as a "negative" class, which will downscore the element. It looks like we've done that all the way since the initial commit. The good news is that that probably means we could try dropping it, the bad news is twofold: it'll be a bit finicky because we probably want to keep foot and footer, but foot will match footnote (maybe we need to make footnote positive to counteract that?) and there'll be no tests so who knows what websites we make worse if we do that (but only one way to find out, of course...).

I'd happily take a pull request to try and tackle this, if you'd be interested in writing one? You can use the generate-testcase.js script to create a testcase based on a slug and URL (or you can hand-write some minimal HTML if you prefer!).

bradonomics commented 3 years ago

I've noticed the same behavior; footnotes I'd rather keep are being stripped. I wonder if a whitelist option could be added? This would allow users to add something like the below on a per-case basis?

var article = new Readability(doc.window.document, {
  whiteList: ['footnotes', 'endnotes']
}).parse();
gijsk commented 3 years ago

@bradonomics I think this is a case where the default should be sensible, ie we should fix the issue without requiring consumers to add extra options. I outlined how this could be done in a previous comment. Again, happy to look at pull requests for this; unfortunately I don't have time to write this up myself right now.

ghost commented 1 month ago

Looked into this issue and it seems like it's fine to remove foot and footnote from the negative class? Omnivore's fork of readability does this.

uqs commented 3 weeks ago

Just bumped into this myself, can this be merged?

gijsk commented 1 week ago

Just bumped into this myself, can this be merged?

There isn't a patch here to be merged, though I'd take one if one were provided. It'd need to update some of the testcases.

gijsk commented 1 week ago

I ended up just doing this myself as a PR in #907.

uqs commented 1 week ago

By "merge this" I meant of course the Omnivore patch that was referenced: https://github.com/omnivore-app/omnivore/pull/933/commits/b317a0877b40f9e04ac2b1513e02b0f0d7c7970f