martinrotter / rssguard

Feed reader (and podcast player) which supports RSS/ATOM/JSON and many web-based feed services.
GNU General Public License v3.0
1.64k stars 125 forks source link

[BUG]: network error (access to the content was denied) #1490

Closed turbodude closed 2 months ago

turbodude commented 2 months ago

Brief description of the issue

A feed stopped updating recently, showing the error in the status. Enabling HTTP/2 in the settings eliminates the error but the feed still won't update. Feedbro seems to be serving the feed without any issues...

How to reproduce the bug?

Subscribe to https://sanet.st/rss/

What was the expected result?

Feed getting updates. There are daily updates on the website

What actually happened?

Feed won't update

Debug log

time="   486.468" type="debug" -> feed-downloader: Starting feed updates from worker in thread '0x59c8'.
time="   486.468" type="debug" -> feed-downloader: All caches synchronized.
time="   486.468" type="debug" -> database: SQLite connection 'db_connection_22984' is already active.
time="   486.468" type="debug" -> database: SQLite database connection 'db_connection_22984' to file 'RSSGuard/data4/database/database.db' seems to be established.
time="   486.468" type="debug" -> feed-downloader: Downloading new messages for feed ID '32' URL: 'https://sanet.st/rss/' title: 'Video Courses/Design - SoftArchive' in thread  '8040'.
time="   486.468" type="debug" -> database: SQLite connection 'db_connection_8040' is already active.
time="   486.468" type="debug" -> database: SQLite database connection 'db_connection_8040' to file 'RSSGuard/data4/database/database.db' seems to be established.
time="   486.468" type="debug" -> core: Downloading URL 'https://sanet.st/rss/' to obtain feed data.
time="   486.468" type="debug" -> network: Settings of BaseNetworkAccessManager loaded.
time="   486.635" type="debug" -> network: Destroying Downloader instance.
time="   486.635" type="debug" -> network: Destroying SilentNetworkAccessManager instance.
time="   486.641" type="warning" -> core: Error 'QNetworkReply::ContentAccessDenied' during fetching of new messages for feed 'https://sanet.st/rss/'.
time="   486.641" type="critical" -> network: Error when fetching feed: 'Feed::Status::NetworkError' message: 'access to the content was denied'.
time="   486.641" type="debug" -> feed-downloader: Made progress in feed updates, total feeds count 1/1 (id of feed is 32).
time="   486.644" type="debug" -> feed-downloader: Finished feed updates in thread '0x59c8'.
time="   486.667" type="debug" -> CTRL is NOT pressed while sorting articles - sorting with standard mode.
time="   486.681" type="debug" -> message-model: Repopulated model, SQL statement is now:
 'SELECT Messages.id, Messages.is_read, Messages.is_important, Messages.is_deleted, Messages.is_pdeleted, Messages.feed, Messages.title, Messages.url, Messages.author, Messages.date_created, Messages.contents, Messages.enclosures, Messages.score, Messages.account_id, Messages.custom_id, Messages.custom_hash, Feeds.title, Feeds.is_rtl, CASE WHEN LENGTH(Messages.enclosures) > 10 THEN 'true' ELSE 'false' END AS has_enclosures, (SELECT GROUP_CONCAT(Labels.name) FROM Labels WHERE Messages.labels LIKE "%." || Labels.custom_id || ".%") as msg_labels, Messages.labels FROM Messages LEFT JOIN Feeds ON Messages.feed = Feeds.custom_id AND Messages.account_id = Feeds.account_id WHERE Feeds.custom_id IN ('32') AND Messages.is_deleted = 0 AND Messages.is_pdeleted = 0 AND Messages.account_id = 1 ORDER BY Messages.date_created DESC, LOWER(Messages.title) ASC, LOWER(Feeds.title) ASC;'.
time="   486.681" type="debug" -> gui: Reloading of msg selections took 14 miliseconds.

Operating system and version

martinrotter commented 2 months ago

I tested with latest RSS Guard, everything works fine here, must be something local. You can try to change user-agent in RSS Guard settings.

turbodude commented 2 months ago

I don't see anything about a user agent in the settings... Found this --user-agent <user-agent> but it's unclear how to use it, what is supposed to go inside <user-agent>? Please provide more details

turbodude commented 2 months ago

launched the program like this rssguard.exe --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0" and still no luck. Again in Feedbro inside Firefox the feed is working, and I don't think Firefox is using some kind of a proxy here. I also tried switching between ATOM/RSS 0.9/RSS 2.0 and no difference Any other suggestions?

turbodude commented 1 month ago

That feed is also working in QuiteRSS (besides Feedbro); there are no special connection settings in either case.

Please provide a guide on setting up user agents or whatever is the reason for this malfunction.

martinrotter commented 1 month ago

You can change user-agent in settings or via CLI, read the documentation. I am not able to reproduce your issue, like I wrote above.

turbodude commented 1 month ago

Except there are no mentions of user agent in the settings, and launching RSSGuard with something like --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130" does nothing to resolve this (I tried different user agent stings) Maybe I'm doing something wrong, but since you are so extremely helpful I guess we'll never know.

Ac314 commented 1 month ago

The site is protected with CloudFlare human verification, so the pages (including rss) cannot be downloaded directly. The usual workaround for this is to open the browser inside RSS Guard and visit the site. After that all cookies should be set (hopefully) and feeds can be downloaded normally. But for me this site (https://sanet.st) cannot be fully loaded in browser, it just refresh the page indefinitely. Maybe it is just for me, I do not know... BTW I am not sure if this trick is possible with the Lite version of RSS Guard.

martinrotter commented 1 month ago

Except there are no mentions of user agent in the settings, and launching RSSGuard with something like --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130" does nothing to resolve this (I tried different user agent stings) Maybe I'm doing something wrong, but since you are so extremely helpful I guess we'll never know.

Be very very careful with your wording here.

User-agent visual setting is already implemented and will appear in next version or you can download devbuild (see readme).

CLI switch is really there and sets desired user-agent, WHY your site is refusing to load exactly is out-of-scope for this bug tracker because for me, the issue is simply not reproducible, thus I cannot do anything about it.

Lite version does not support the approach @Ac314 suggested, but full version indeed does and I use it regularly. RSS Guard shares cookies and some other data between builtin browser and feed downloader component, allowing the workflow described above. Thats about it.

Cheers.

MoneyAllDay commented 1 month ago

I'm having a similar issue with a feed that on QuiteRSS also works fine. The thing is that i can't even add it to the reader.

The feed: https://www.msi.com/rss/feed/product/MAG-Z690-TOMAHAWK-WIFI-DDR4/Download

I also tried @Ac314 method, but it won't work. I'm using the full version of RSSGuard.

Can any of you check if this feed is working for you?

turbodude commented 1 month ago

I'm having a similar issue with a feed that on QuiteRSS also works fine. The thing is that i can't even add it to the reader.

The feed: https://www.msi.com/rss/feed/product/MAG-Z690-TOMAHAWK-WIFI-DDR4/Download

I also tried @Ac314 method, but it won't work. I'm using the full version of RSSGuard.

Can any of you check if this feed is working for you?

Works on my end, 4.7.4 lite

Ac314 commented 1 month ago

In full version I cannot fetch it ("Network error: access to content was denied"), but in the embedded browser it can be viewed with no issues. Strange...

Ac314 commented 1 month ago

I have made some testing (with the feed https://www.msi.com/rss/feed/product/MAG-Z690-TOMAHAWK-WIFI-DDR4/Download), conclusions below: 1) This site have some restrictions on connections amount. If you make too many requests it will start to respond "Access denied". 2) Headers 'User-Agent' and 'Accept' are not enough to get the correct response. On my experience the HTTP header 'Priority' is also needed. I get the correct response with the following headers set (imitating the Firefox browser):

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) Gecko/20100101 Firefox/131.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,/;q=0.8 Priority: u=0, i

With all 3 headers set I receive correct response but if I exclude any header from these three and make a request I always get "Access denied" (with only 2 headers set).

What to propose here: 1) There should be no problem if decent check interval is set (like 1 hour). 2) This is worse because RSS Guard seems does not an ability to set arbitrary HTTP headers now (only User-Agent can be set). The solution here can be to enhance this functionality to allow users to set any header (header name and header value). Most flexible way is to allow to set it for separate feeds (because I guess for 99% of feeds it is just not needed). Simpler way to implement is to have it as a global setting (for all feeds).

Anyway if my conclusions are correct this feed cannot be fetched properly without additional functionality added... At least for the RSS Guard full version (I do not have the Lite one).

martinrotter commented 1 month ago

@Ac314 I will try to add the custom headers per feed configuration. Could you then perhaps test from your side once I add it?

martinrotter commented 1 month ago

Workin on it.

image

martinrotter commented 1 month ago

With curl, these headers are needed for me.

curl "https://www.msi.com/rss/feed/product/MAG-Z690-TOMAHAWK-WIFI-DDR4/Download" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) Gecko/20100101 Firefox/131.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8" -H "Accept-Language: en-GB,en;q=0.5" -H "Accept-Encoding: gzip, deflate, br, zstd" -H "Priority: u=0, i"
martinrotter commented 1 month ago

image image

Ac314 commented 1 month ago

@Ac314 I will try to add the custom headers per feed configuration. Could you then perhaps test from your side once I add it?

Yes, of course I will test.

martinrotter commented 1 month ago

Fixed, test when devbuild compiles pls.

Now, remember there is app-wide User-Agent setting and when user sets user-agent via feed-specific custom HTTP headers, then this setting has HIGHER priority than app-wide setting.

Ac314 commented 1 month ago

Definitely works with the feed https://www.msi.com/rss/feed/product/MAG-Z690-TOMAHAWK-WIFI-DDR4/Download. 3 headers I have mentioned above was enough for me, fetched with no problems. So at least @MoneyAllDay problem is resolved.

@martinrotter is it possible to extend this new "custom HTTP headers" functionality to the embedded browser? I wanted to play with this Cloudflare protection a bit (finding out how to pass it), and, if my speculations are correct, without custom headers we cannot do anything with it.

chpasha commented 3 weeks ago

Hi, I have the same problem with this feed https://www.mydealz.de/rss/hot - it doesn't work either in rssguard nor in quiterss (since a couple of days) but it does work in Liferea, Feedly and directly with curl / firefox. Any idea what is wrong with it?

Ac314 commented 3 weeks ago

Hi, I have the same problem with this feed https://www.mydealz.de/rss/hot - it doesn't work either in rssguard nor in quiterss (since a couple of days) but it does work in Liferea, Feedly and directly with curl / firefox. Any idea what is wrong with it?

Try to access it from the usual browser first (and make sure it works), then try to add it to RSS Guard.

chpasha commented 3 weeks ago

Try to access it from the usual browser first (and make sure it works), then try to add it to RSS Guard.

I did, it works without a problem image

image

Ac314 commented 3 weeks ago

Try to access it from the "conventional" browser (like Firefox or Chrome), then add it to the RSS Guard and see if something changed.

chpasha commented 3 weeks ago

Try to access it from the "conventional" browser (like Firefox or Chrome), then add it to the RSS Guard and see if something changed.

Did it multiple times - it works everywhere except for rss readers (and not all of them) - there is nothing special, no redirection, no extra headers, simple "curl url" works as well. Looks like the feed proactively blocks known clients (but not all, since feedly adn Liferea work). What User-Agent and headers (if any) are used by rssguard, maybe I could reproduce the problem with curl, if I can simulate the request

Ac314 commented 3 weeks ago

Try to access it from the "conventional" browser (like Firefox or Chrome), then add it to the RSS Guard and see if something changed.

Did it multiple times - it works everywhere except for rss readers (and not all of them) - there is nothing special, no redirection, no extra headers, simple "curl url" works as well. Looks like the feed proactively blocks known clients (but not all, since feedly adn Liferea work). What User-Agent and headers (if any) are used by rssguard, maybe I could reproduce the problem with curl, if I can simulate the request

You are right, works everywhere except RSS Guard. What is more, if I set proxy for RSS Guard it starts working too. Changing User-Agent does not help. Have no idea why it happens.

Headers (including cookies) used while fetching by RSS Guard:

GET /rss/hot HTTP/1.1 Host: www.mydealz.de User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/6.7.3 Chrome/118.0.5993.220 Safari/537.36 RSS Guard/4.7.4 RSS Guard/4.7.4 Cookie: f_v=%22ca34ea40-9522-11ef-bc33-0242ac110003%22; u_l=0; pepper_session=%22Q3kmUAYwclmlbHZDvgDBW7TES6XkdLBReMgMeRSl%22; mascot=eyJpdiI6IjZlUmd2THlLYkhlYlJhQ1RnM0pRYmc9PSIsInZhbHVlIjoiWTNYRkZuZ3J1NnZmcVFoeURWYUx2a0h3VW9pTTEwTm4vSnJkdkpXYlJ6NjBWZUFIS09YbWtSWkE3V3ZzNWpvdnF0T3g2SW8wOGVrNGYxWTlTR2d4RE44VnI5M01zVFVKVHUydXk0ZkJ6c0ZMbko1aHhWVEl6T0V5UWRZVk9uOWxZK1pqSTZMVlRmdVdJRzhMYzFBQXI4Tk1XTXNTK0p4dDkwTEduY1Q4Yy8zeDgvQ1dwdUxkMUkxTk9ndWl1N0h2Q0ZpVC9qNmJzUnMzaVZnV1k1cFV0YS85S0htbTdPTmo5S3RZUlB6K3E4MD0iLCJtYWMiOiJhZmExOGZmYjU0MWZhMzRjNmQzMjQ5ZDQ0YTkxYmQwZjIyZmIzOGE5NjM0NDIzNGJkYzdmMTc0YjY4NzBlMTA1IiwidGFnIjoiIn0%3D; xsrf_t=%22xdPd5SBS5MA5BEKGLwxVopRkQJZiGLym6vAOGeCe%22 Connection: Keep-Alive Accept-Language: ru-RU,en,* Accept-Encoding: gzip, deflate

chpasha commented 3 weeks ago

it must be cloudflare, I could reproduce the problem from oracle cloud server - it returns a html page with javascript and then after some check the rss (in usual case)

curl -v [mydealz.de/rss…hot](https://www.mydealz.de/rss/hot) [8:28:49]
* Trying 104.18.201.116:443...
* TCP_NODELAY set
* Connected to [mydealz.de](http://www.mydealz.de/) (104.18.201.116) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=[mydealz.de](http://mydealz.de/)
* start date: Sep 25 06:12:53 2024 GMT
* expire date: Dec 24 06:12:52 2024 GMT
* subjectAltName: host "[mydealz.de](http://www.mydealz.de/)" matched cert's "*.[mydealz.de](http://mydealz.de/)"
* issuer: C=US; O=Google Trust Services; CN=WE1
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0xaaab048e1e90)
> GET /rss/hot HTTP/2
> Host: [mydealz.de](http://www.mydealz.de/)
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 403

< server: cloudflare
Ac314 commented 3 weeks ago

Cloudflare usually works another way - it performs series of redirects and then set specific cookie for the site to verify. And here is another behaviour - no cookie is needed at all. There is something else, maybe the problem with the HTTPS handshake.

Create the separate bug report for this, maybe the program author will have time to investigate.

chpasha commented 3 weeks ago

As far as I understand from reading cloudflare forums, it is another method of protection called JS handshake. Have submitted FR https://github.com/martinrotter/rssguard/issues/1527