miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.8k stars 716 forks source link

Feeds from artstation.com return error saying - Last Parsing Error Access to this website is forbidden. Perhaps, this website has a bot protection mechanism? #2207

Open raghu-kamath opened 11 months ago

raghu-kamath commented 11 months ago

I was adding few feeds from some artists on artstation. The feed url seems to have the data but minniflux can't access it . The feeds do show up in akregator KDE's feed reader application.

One of the link for the feed is https://danielcorreia.artstation.com/rss

shizunge commented 11 months ago

I met the same problem on many ofy feeds.

I guess it is related to a bug and it is fixed here https://github.com/miniflux/v2/pull/2193

fguillot commented 10 months ago

It's blocked by Cloudflare bot protection mechanism. Changing the user agent is not enough. They probably fingerprint the Golang HTTP client. That's probably the reason why Akregator works.

I have the same result when testing with curl:

curl -I https://danielcorreia.artstation.com/rss
HTTP/2 403
[...]
server: cloudflare

Snippet from the HTML page returned:

<h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>

            <p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>

            <p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
ghost commented 10 months ago

I met the same problem on many ofy feeds.

I guess it is related to a bug and it is fixed here #2193

same issue here with rss for reddit sub(https://www.reddit.com/r/selfhosted/.rss eg),already tried to build latest version with this PR merged,still no lucky.

ghost commented 10 months ago

ok,I can confirm that reddit just blocked the ip address of my server,when I tried to curl https://www.reddit.com/r/selfhosted/.rss it returns below infomation.

<!doctype html>
     <html>
  <head>
    <title>Blocked</title>
    <style>
      body {
          font: small verdana, arial, helvetica, sans-serif;
          width: 600px;
          margin: 0 auto;
      }

      h1 {
          height: 40px;
          background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;
      }
    </style>
  </head>
  <body>
    <h1>whoa there, pardner!</h1>

<p>reddit's awesome and all, but you may have a bit of a
problem.</p>

<p>Make sure your User-Agent is not empty, is something unique and descriptive and try again. if you're supplying an alternate User-Agent string,
try changing back to default as that can sometimes result in a block.</p>

<p>If you're attemping to access Reddit via a hosting provider, please register for and sign in with your developer credentials <a href=https://www.reddit.com/wiki/api/>here</a>.</p>

<p>You can read Reddit's Terms of Service <a href=https://www.reddit.com/wiki/api/>here</a>.</p>

<p>if you think that we've incorrectly blocked you or you would like to discuss
easier ways to get the data you want, please contact us at <a href=mailto:ratelimit@reddit.com?Subject=Blocked%20******** (Request ID: 2220882771-NRT)>this email address</a>.</p>

<p>when contacting us, please include your ip address which is: <strong>*******</strong> and reddit account</p>
  </body>
</html>

so it's not a issue related with miniflux.Tried to mail reddit to unblock my server but ...

Hi there!

Thanks for contacting us! At this time, we are not currently accepting inquiries via email. If you need support with our API or have questions, please submit your request [here](https://reddithelp.com/hc/en-us/requests/new?ticket_form_id=14868593862164).

Cheers,
Your friends at Reddit
nakkaya commented 10 months ago

Couple notes on the issue, got bitten by it today,

g2g commented 9 months ago

Hi, I had the same problem with Reddit ("Access to this website is forbidden. Perhaps, this website has a bot protection mechanism?") . With my Feed Username / Password, it solves it. (No need to use another Default User Agent.) I hope this helps.