pictuga / morss

Get full text RSS feeds
https://morss.it/
GNU Affero General Public License v3.0
624 stars 75 forks source link

Cannot scrape full article text of Google News RSS Feeds #125

Open Kate-Actuary-Viola opened 8 months ago

Kate-Actuary-Viola commented 8 months ago

How can I scrape the full article text of Google News feeds. When I insert the link, I get a cookie-reminder-text for every article instead of the article text. Is there a way to automatically reply to the cookie selector when parsing?

pictuga commented 8 months ago

Can you share the link of the feed?

Kate-Actuary-Viola commented 8 months ago

Thanks for your help! Here's an example: https://morss.it/https://news.google.com/rss/topics/CAAqHAgKIhZDQklTQ2pvSWJHOWpZV3hmZGpJb0FBUAE/sections/CAQiUENCSVNOam9JYkc5allXeGZkakpDRUd4dlkyRnNYM1l5WDNObFkzUnBiMjV5Q3hJSkwyMHZNREZtZEhvMGVnc0tDUzl0THpBeFpuUjZOQ2dBKjEIACotCAoiJ0NCSVNGem9JYkc5allXeGZkako2Q3dvSkwyMHZNREZtZEhvMEtBQVABUAE?hl=en-US&gl=US&ceid=US%3Aen

pictuga commented 8 months ago

Google seems to replace the original links with some redirects that create this problem...

Kate-Actuary-Viola commented 8 months ago

Exactly, thats the issue. Mostly only news articles that show up on google news are publicly available for some time and then hidden behind a paywall. Thats why full text view is much more interesting for those feeds than other feeds of newspapers. Is there any way around it? Is it possible to use the redirect link? I am reading only about python scripts as an alternative.