miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.82k stars 719 forks source link

RSS feed returns 403/CAPTCHA from Cloudflare #1491

Closed pschupp closed 7 months ago

pschupp commented 2 years ago

The RSS feed in question is: (https://www.fiercebiotech.com/rss/biotech/xml)

I'm well aware this issue has been disussed in various other posts with solutions being:

  1. Changing the IP from which you access the feed (https://github.com/miniflux/v2/issues/1332)
  2. Changing the UserAgent
  3. Changing the underlying library used in accessing the feed (from Go to another language) via a proxy (https://github.com/miniflux/v2/issues/1047)

I have tried all three solutions with Miniflux and they have not changed the outcome. I have tried pproxy and tinyproxy on my local host as proxies.

Using curl, the default UserAgent returns the captcha, but using (Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0) returns the feed properly.

My questions are:

  1. Is there another method to attempt for sucessfull fetching of this feed?
  2. Given that curl with the right UserAgent works, is there any way to use curl to fetch the feed data instead of the Go libraries?
rdelaage commented 2 years ago

Did you try to change the used user agent in tour feed settings ?

pschupp commented 2 years ago

I did try that.

On Thu, Jul 21, 2022 at 10:32:54PM -0700, Romain de Laage wrote:

Did you try to change the used user agent in tour feed settings ? — Reply to this email directly, view it on GitHub, or unsubscribe.⁠​ You are receiving this because you authored the thread.⁠​Message ID:⁠​ <miniflux/v2/issues/1491/ 1192199412 ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Did you try to change the used user agent in tour feed settings ?

— Reply to this email directly, [1]view it on GitHub, or [2]unsubscribe. You are receiving this because you authored the thread.*Message ID: <miniflux/ @.***>

References:

[1] https://urldefense.com/v3/__https://github.com/miniflux/v2/issues/1491*issuecomment-1192199412__;Iw!!LQC6Cpwp!vgnfqjT25X_CWEdz1WGw6DP9iBstvYB-qhJr6MXuWoqEXaK6dzzDaqVo7G2BG9ivfPWsWEPYvlydo2x6V95wo8fX3D6jeEo$ [2] https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AI5X6LKOYHFZTLDEVFDTYZDVVIXANANCNFSM5254CD6A__;!!LQC6Cpwp!vgnfqjT25X_CWEdz1WGw6DP9iBstvYB-qhJr6MXuWoqEXaK6dzzDaqVo7G2BG9ivfPWsWEPYvlydo2x6V95wo8fXrGg3Wlc$

rdelaage commented 2 years ago

I just tried to change the user agent with curl with the one in your example, it didn't work. But it worked with wget. I use HTTP 1.1 and the same headers in both... I don't understand

pschupp commented 2 years ago

Anyway, thanks for trying. The following works for me:

Either without the useragent returns the the captcha for curl or gives a 403 for wget. Sorry I can't be more helpful, but I'm afraid this stuff is out of my depth!

pschupp commented 2 years ago

I did try that.

fguillot commented 7 months ago

This feed (https://www.fiercebiotech.com/rss/biotech/xml) works for me if you disable HTTP/2 to avoid fingerprinting. Requires Miniflux >= 2.0.1.