miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.44k stars 702 forks source link

Scraper rules don't work with theguardian.com #2688

Open advert665 opened 2 weeks ago

advert665 commented 2 weeks ago

The Guardian publishes summaries in thier rss feeds, so I want to use the scraper rules to load the full content from the corresponding webpage. However, when I use a selector that corresponds to the desired content on the webpage it won't load.

For instance, using div#maincontent or p.dcr-iy9ec7, fails to change the resulting article in miniflux for the following feed, even though they select elements in the linked pages: https://www.theguardian.com/theguardian/mainsection/topstories/rss

Similarly, using picture to extract the cartoons from https://www.theguardian.com/profile/martinrowson/rss (with or without the add_dynamic_image rule), fails to load anything in miniflux.

Other RSS apps like Lire are able to load the full articles so it's not a Guardian issue specifically. Am I doing something wrong or is this a Miniflux limitation? Thanks!

Screenshot 2024-06-11 102143