lwindolf / liferea

Liferea (Linux Feed Reader), a news reader for GTK/GNOME
https://lzone.de/liferea
GNU General Public License v2.0
825 stars 128 forks source link

Failure to parse a certain feed when set as a URL source, but not when it is the output of a Command source #1205

Closed pavelbraginskiy closed 1 year ago

pavelbraginskiy commented 1 year ago

I follow a webcomic with an RSS feed at https://www.webtoons.com/en/fantasy/your-throne/rss?title_no=2009. If I try to add this url as a feed to liferea, it fails to detect an RSS feed at this link at all:

image

If I simply set the source type to Command and add "curl" out in front of the URL, the feed functions normally. image

lwindolf commented 1 year ago

When you run with --debug-net you can see how Liferea gets a redirect from the website:

< HTTP/1.1 301 Moved Permanently
< Soup-Debug-Timestamp: 1680366726
< Soup-Debug: SoupMessage 7 (0x5619bf193b90)
< Server: nginx
< Date: Sat, 01 Apr 2023 16:32:06 GMT
< Content-Length: 0
< Connection: keep-alive
< accept-ch: Sec-CH-UA,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Platform,Sec-CH-UA-Platform-Version,Sec-CH-UA-Model
< Location: /en/
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block
< strict-transport-security: max-age=31536000 ; includeSubDomains
< X-Frame-Options: SAMEORIGIN
< content-language: en
< referrer-policy: unsafe-url
< Cache-Control: public
< X-Varnish: 3375184
< Age: 0
< Via: 1.1 varnish (Varnish/6.6)
< referrer-policy: unsafe-url

> GET /en/ HTTP/1.1
> Soup-Debug-Timestamp: 1680366726
> Soup-Debug: SoupSession 1 (0x5619bec91100), SoupMessage 7 (0x5619bf193b90), SoupSocket 5 (0x5619bf0f0e70), restarted
> Host: m.webtoons.com
> Accept: application/atom+xml,application/xml;q=0.9,text/xml;q=0.8,*/*;q=0.7
> DNT: 1
> Accept-Encoding: gzip, deflate
> Connection: Keep-Alive
> User-Agent: Liferea/1.15.0 (Android 12; Mobile; https://lzone.de/liferea/) AppleWebKit (KHTML, like Gecko)
> Cookie: locale=en; needGDPR=true; needCCPA=false; needCOPPA=false; countryCode=DE

And if I interpret this right it is an redirect on a GDPR cookie dialog.

I consider this a website bug. Because a client requesting XML can't serve a human response clicking GDPR dialogs.

pavelbraginskiy commented 1 year ago

I did some of my own investigation because that didn't just didn't seem right: it works fine when I use curl. I eventually realized that the problem is the user agent liferea sends, this particular website responds "properly" when the User-Agent string does not contain the word "Android".

image image image

It seems the site doesn't expect a mobile device to be requesting a feed. Is there a way to configure the user agent liferea sends?

lwindolf commented 1 year ago

Yes there is a feature to do so. Quote from the liferea manpage section about environment variables:

ENVIRONMENT
[...]
       LIFEREA_UA
              If defined,  its  value  replaces  the  default  HTTP/S  User-Agent
              string.