miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
7k stars 732 forks source link

Google News Feed does not load correctly via external browser as well as problems loading the image #1390

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi,

I wanted to add Google News (atom or rss), i.E.

https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFZxYUdjU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen

  1. How can I extract images?
  2. Why is the feed always opening the Google News website and not the newspaper itself?

Thank you

fguillot commented 2 years ago

The RSS feed redirects to the original article:

RSS Feed URL: https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFZxYUdjU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen&oc=11

<item>
<title>Russian Airstrikes Intensify in Western Ukraine - The Wall Street Journal</title>
<link>https://news.google.com/__i/rss/rd/articles/CBMiWGh0dHBzOi8vd3d3Lndzai5jb20vYXJ0aWNsZXMvcnVzc2lhbi1haXJzdHJpa2VzLWludGVuc2lmeS1pbi13ZXN0ZXJuLXVrcmFpbmUtMTE2NDY5OTQ4MDjSAQA?oc=5</link>
<guid isPermaLink="false">CAIiEEdSyR22RIXW-HEi_tjwWaYqGAgEKg8IACoHCAow1tzJATDnyxUwxMrPBg</guid>
<pubDate>Fri, 11 Mar 2022 14:52:00 GMT</pubDate>
<description><a href="https://news.google.com/__i/rss/rd/articles/CBMiWGh0dHBzOi8vd3d3Lndzai5jb20vYXJ0aWNsZXMvcnVzc2lhbi1haXJzdHJpa2VzLWludGVuc2lmeS1pbi13ZXN0ZXJuLXVrcmFpbmUtMTE2NDY5OTQ4MDjSAQA?oc=5" target="_blank">Russian Airstrikes Intensify in Western Ukraine</a>&nbsp;&nbsp;<font color="#6f6f6f">The Wall Street Journal</font><strong><a href="https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2lqODhfN0JCRmQ4UXIyOUdvOEVpZ0FQAQ?oc=5" target="_blank">View Full Coverage on Google News</a></strong></description>
<source url="https://www.wsj.com">The Wall Street Journal</source>
</item>
<item>
<title>Biden calls for suspending normal trade relations with Russia and will ban imports of vodka and seafood - CNN</title>
<link>https://news.google.com/__i/rss/rd/articles/CBMiVWh0dHBzOi8vd3d3LmNubi5jb20vMjAyMi8wMy8xMC9wb2xpdGljcy9ydXNzaWEtdHJhZGUtc3RhdHVzLXVzLWc3LWV1LWJpZGVuL2luZGV4Lmh0bWzSAQA?oc=5</link>
<guid isPermaLink="false">1334975107</guid>
<pubDate>Fri, 11 Mar 2022 16:46:00 GMT</pubDate>
<description><ol><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiVWh0dHBzOi8vd3d3LmNubi5jb20vMjAyMi8wMy8xMC9wb2xpdGljcy9ydXNzaWEtdHJhZGUtc3RhdHVzLXVzLWc3LWV1LWJpZGVuL2luZGV4Lmh0bWzSAQA?oc=5" target="_blank">Biden calls for suspending normal trade relations with Russia and will ban imports of vodka and seafood</a>&nbsp;&nbsp;<font color="#6f6f6f">CNN</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiZmh0dHBzOi8vd3d3Lm5iY25ld3MuY29tL3BvbGl0aWNzL3doaXRlLWhvdXNlL2JpZGVuLXB1c2gtZW5kaW5nLW5vcm1hbC10cmFkZS1yZWxhdGlvbnMtcnVzc2lhLXJjbmExOTY2MNIBAA?oc=5" target="_blank">Biden calls for ending normal trade relations with Russia</a>&nbsp;&nbsp;<font color="#6f6f6f">NBC News</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiK2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3dhdGNoP3Y9OTZmYjRiZmtwZ3fSAQA?oc=5" target="_blank">Russia to lose 'most favored nation' trade status</a>&nbsp;&nbsp;<font color="#6f6f6f">Reuters</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiamh0dHBzOi8vd3d3LmNubi5jb20vZXVyb3BlL2xpdmUtbmV3cy91a3JhaW5lLXJ1c3NpYS1wdXRpbi1uZXdzLTAzLTExLTIyL2hfNDQwNDE4MTE3MTA0OGE5N2Y3NjhiNDRiNTlhM2FhZDTSAQA?oc=5" target="_blank">The US will ban imports of alcohol and seafood from Russia. Here's what we know</a>&nbsp;&nbsp;<font color="#6f6f6f">CNN</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiK2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3dhdGNoP3Y9d1FsV1ZCMHpSczTSAQA?oc=5" target="_blank">Russian forces moving on Kyiv as fighting intensifies in Ukraine</a>&nbsp;&nbsp;<font color="#6f6f6f">Eyewitness News ABC7NY</font></li><li><strong><a href="https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2lEdGNqOEJCR3E0S3I0dEQ2Ukt5Z0FQAQ?oc=5" target="_blank">View Full Coverage on Google News</a></strong></li></ol></description>
<source url="https://www.cnn.com">CNN</source>
</item>

This link redirects to the original article: https://news.google.com/__i/rss/rd/articles/CBMiWGh0dHBzOi8vd3d3Lndzai5jb20vYXJ0aWNsZXMvcnVzc2lhbi1haXJzdHJpa2VzLWludGVuc2lmeS1pbi13ZXN0ZXJuLXVrcmFpbmUtMTE2NDY5OTQ4MDjSAQA?oc=5

However the Atom is formed differently:

Atom URL: https://news.google.com/atom/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRFZxYUdjU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen&oc=11

<entry>
<id>https://news.google.com/articles/CBMiUmh0dHBzOi8vd3d3LmNubi5jb20vZXVyb3BlL2xpdmUtbmV3cy91a3JhaW5lLXJ1c3NpYS1wdXRpbi1uZXdzLTAzLTExLTIyL2luZGV4Lmh0bWzSAQA?oc=5</id>
<title type="html">Live updates: Russia invades Ukraine - CNN</title>
<updated>2022-03-11T18:40:00.000000000Z</updated>
<link href="https://news.google.com/__i/rss/rd/articles/CBMiUmh0dHBzOi8vd3d3LmNubi5jb20vZXVyb3BlL2xpdmUtbmV3cy91a3JhaW5lLXJ1c3NpYS1wdXRpbi1uZXdzLTAzLTExLTIyL2luZGV4Lmh0bWzSAQA?oc=5" type="text/html"/>
<content type="html"><ol><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiUmh0dHBzOi8vd3d3LmNubi5jb20vZXVyb3BlL2xpdmUtbmV3cy91a3JhaW5lLXJ1c3NpYS1wdXRpbi1uZXdzLTAzLTExLTIyL2luZGV4Lmh0bWzSAQA?oc=5" target="_blank">Live updates: Russia invades Ukraine</a>&nbsp;&nbsp;<font color="#6f6f6f">CNN</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiSWh0dHBzOi8vd3d3Lm5wci5vcmcvbGl2ZS11cGRhdGVzL3VrcmFpbmUtcnVzc2lhLXdlc3Rlcm4tY2l0aWVzLTAzLTExLTIwMjLSAQA?oc=5" target="_blank">War in Ukraine live updates: Russia intensifies air attacks; U.S. blocks Russian imports of vodka and other goods</a>&nbsp;&nbsp;<font color="#6f6f6f">NPR</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiQGh0dHBzOi8vd3d3Lm55dGltZXMuY29tL2xpdmUvMjAyMi8wMy8xMC93b3JsZC91a3JhaW5lLXJ1c3NpYS13YXLSAQA?oc=5" target="_blank">What Happened on Day 15 of Russia’s Invasion of Ukraine</a>&nbsp;&nbsp;<font color="#6f6f6f">The New York Times</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMiamh0dHBzOi8vd3d3LmNubi5jb20vZXVyb3BlL2xpdmUtbmV3cy91a3JhaW5lLXJ1c3NpYS1wdXRpbi1uZXdzLTAzLTExLTIyL2hfYjY4YjVhZGFiZmFlZjZiNTJjYTUxMTk2ZDg5M2IwNjHSAQA?oc=5" target="_blank">Russian ground forces are regrouping, as Ukraine's west is attacked for the first time</a>&nbsp;&nbsp;<font color="#6f6f6f">CNN</font></li><li><a href="https://news.google.com/__i/rss/rd/articles/CBMicmh0dHBzOi8vd3d3LnRpbWVzb2Zpc3JhZWwuY29tL3dvcmxkLWNvbmRlbW5zLXJ1c3NpYW4taG9zcGl0YWwtc3RyaWtlLWFzLXRyb29wcy1hZHZhbmNlLW9uLWt5aXYtY2hva2Utb2ZmLW1hcml1cG9sL9IBdmh0dHBzOi8vd3d3LnRpbWVzb2Zpc3JhZWwuY29tL3dvcmxkLWNvbmRlbW5zLXJ1c3NpYW4taG9zcGl0YWwtc3RyaWtlLWFzLXRyb29wcy1hZHZhbmNlLW9uLWt5aXYtY2hva2Utb2ZmLW1hcml1cG9sL2FtcC8?oc=5" target="_blank">World condemns Russian hospital strike as troops advance on Kyiv, choke off Mariupol</a>&nbsp;&nbsp;<font color="#6f6f6f">The Times of Israel</font></li></ol></content>
</entry>

Miniflux might need to be updated to handle this Atom feed. Entries have only type="text/html" and no rel attribute.

I haven't checked the RFC yet but here are the relevant links:

Regarding the question 1, the feed itself doesn't provide any images. If you use the RSS feed, and download the full content, Miniflux will attempts to crawl the original article by following the redirect link.

ghost commented 2 years ago

Thank you. I managed partially:

Some articels are loading correctly, but not all. Is this correct or do I missunderstood this feature? Would be awesome if somebody with more knowledge can help with Google News. BR

ghost commented 2 years ago

Look at this: https://github.com/d3ward/nextntp/blob/main/nextntp.js#L1864 It is extracting the Google News in a similar way. Maybe we can adopt the rendering?