mozilla / readability

A standalone version of the readability lib
Other
8.93k stars 606 forks source link

New York Times glossary spans should be removed from reader mode #540

Open gijsk opened 5 years ago

gijsk commented 5 years ago

From https://bugzilla.mozilla.org/show_bug.cgi?id=1544594

Some NYT articles include markup like this:

<span class="glossary-wrapper">
<span class="glossary-text">transparency</span>
<span class="glossary-info">
<span class="glossary-definition">Taking appropriate measures to provide any information relating to processing to the data subject in a concise, intelligible and easily accessible form, using clear and plain language. <a href="https://www.nytimes.com/interactive/2019/04/10/opinion/internet-privacy-terms.html" target="_blank">Glossary</a>
</span><img class="close-tooltip" src="https://static01.nyt.com/newsgraphics/2019/01/22/tooltip-template/1e41ab3eab12f1354e056fb2fa3acc6d09df13e2/close.svg" alt="Close X">
</span><span></span></span>

We should be removing the glossary-info bits.

gijsk commented 5 years ago

Example page: https://www.nytimes.com/2019/04/10/opinion/privacy-feminism.html