mozilla / readability

A standalone version of the readability lib
Other
8.91k stars 605 forks source link

Missing paragraphs for spiegel.de article #776

Open arigon opened 1 year ago

arigon commented 1 year ago

On a couple of paywalled spiegel.de articles some paragraphs are missing. I found a free article with the same behaviour.

Example: https://www.spiegel.de/ausland/china-haelt-militaeruebung-nahe-taiwan-ab-a-cc5cca71-08d3-482e-af02-dc18b880e429 The first two and the last paragraph are missing. See my screenshots of the Firefox reader mode.

image image

r4d1um commented 1 year ago

The same issue also occurs on Firefox on macOS / Android / iOS and Windows. So it seems to be a general Issue in Reader Mode in Firefox.

arigon commented 1 year ago

spiegel.de needs also the div tag for the score: https://github.com/mozilla/readability/blob/8e8ec27cd2013940bc6f3cc609de10e35a1d9d86/Readability.js#L114

gijsk commented 1 year ago

What does adding div do for the other testcases we have in the repository? I would be surprised if it wouldn't lead to any other changes...

ilf commented 8 months ago

More data about false-positive cut-offs on spiegel.de:

First one paragraph cut: https://www.spiegel.de/ausland/damaskus-offenbar-mehrere-explosionen-syrien-macht-israel-verantwortlich-a-cbf70e1c-eb8f-49ca-89ca-faafaeefc505

First two paragraphs cut: https://www.spiegel.de/ausland/andrij-melnyk-ex-botschafter-der-ukraine-in-deutschland-raeumt-fehler-ein-a-7a270dd7-f994-4505-b1c2-23eae33a08c1

First three paragraphs cut: https://www.spiegel.de/ausland/russland-aussenminister-sergej-lawrow-wurde-betankung-in-brasilien-offenbar-verweigert-a-9ea195c1-ba7b-4aab-a501-fe2555a77081

via https://bugzilla.mozilla.org/show_bug.cgi?id=1881909

ZachSaucier commented 4 months ago

I have a version not behind a paywall that you can test with if you'd like: https://zachsaucier.com/test.htm

ilf commented 3 months ago

Two more examples:

This article has 10 paragraphs, but readability only shows 4: https://www.spiegel.de/ausland/israelische-spitzenpolitiker-attackieren-militaer-und-inlandsgeheimdienst-wegen-freilassung-von-chef-der-schifa-klinik-a-5398743a-3666-4df6-aceb-0603e5d923f3

This article has 13 paragraphs, but readability shows none, but only an info-"section" instead: https://www.spiegel.de/ausland/usa-joe-biden-kritisiert-supreme-court-urteil-zu-donald-trump-als-gefaehrlichen-praezedenzfall-a-83073115-2b3a-499d-abe5-dd7afd6c89ea