Open evilpie opened 5 years ago
The issue is that this page uses a pile of <div>
s with no <br>
s and it therefore doesn't pass the simple detection algorithm in https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/Readability-readerable.js .
We can't just add all <div>
s to that because it would cause the detection speed to plummet.
This can easily be fixed by the site by switching to semantic markup ie using <p>
for paragraph text.
I noticed in the website their first para has a textContent.length
of 523. However, our criteria is >540
for "paragraphs". But it's something that should be "readerable".
I noticed in the website their first para has a
textContent.length
of 523. However, our criteria is>540
for "paragraphs". But it's something that should be "readerable".
Sorry, what bit of code are you referring to? Off-hand I don't see 540 anywhere in our codebase. Is my diagnosis in https://github.com/mozilla/readability/issues/503#issuecomment-449048740 not correct? The reader mode button in Firefox is governed by the code in https://github.com/mozilla/readability/blob/master/Readability-readerable.js and I think we're just not picking things up at all because the text content is in <div>
s instead of <p>
s...
540
because we have score += Math.sqrt(textContent.length - 140)
. And we would return true
for score > 20
. So the minimum is 540.
The problem here is about <div>
and <p>
. But in most cases we cannot expect the site owner to change that. I'm just wondering if there is any way for us to regard this kind of <div>
as <p>
.
540
because we havescore += Math.sqrt(textContent.length - 140)
. And we would returntrue
forscore > 20
. So the minimum is 540.
Right, but that can be reached by multiple paragraphs, right? The entire article as a whole should be big enough.
The problem here is about
<div>
and<p>
. But in most cases we cannot expect the site owner to change that. I'm just wondering if there is any way for us to regard this kind of<div>
as<p>
.
Yeah... There's no classes on the <div>
s themselves though, only on some of the containers. Perhaps, like for #420 , we need some per-domain rules to fix this.
The reader mode button doesn't appear in the urlbar for: https://googleprojectzero.blogspot.com/2018/12/on-vbscript.html
about:reader?url=https://googleprojectzero.blogspot.com/2018/12/on-vbscript.html
however seems to mostly work, just the code snippets are a bit mangled.