readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
8.03k stars 3.58k forks source link

Is custom CloudFlare configuration needed for readthedocs.io? #11529

Open dimaqq opened 2 months ago

dimaqq commented 2 months ago

Details

Expected Result

Either being able to use social previews, or clean error if previews are broken

Actual Result

Screenshot 2024-08-08 at 11 56 54

Example solution: https://community.cloudflare.com/t/attention-required-message-when-sharing-link/88999/3

humitos commented 2 months ago

Hi. I'm not sure to understand the issue. Can you expand on what you are trying to do and what's the error you are getting?

dimaqq commented 2 months ago

I imagine that the fact that rtd is proxies by cloudflare is configured in rtd, something that I don’t have access to.

cloudflare, in turn shows a “I’m not a robot” page when Facebook tries to load a “social image”. Because that’s server-to-server communication, human check can never pass.

instead, rtd’s cloudflare account should be configured to allow machine/direct/non-browser access from a white list of ip addresses.

humitos commented 2 months ago

Thanks for the explanation. I'm pinging @ericholscher here because I think he will have more context about this.

ericholscher commented 2 months ago

Seems like the site is working to me?

-> curl -IL https://ops.readthedocs.io/en/latest/
HTTP/2 200 
dimaqq commented 2 months ago

The site is working fine, it's facebook loading the "social image" or preview that doesn't work.

dimaqq commented 2 months ago

You can use this tool from facebook to debug:

https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fops.readthedocs.io%2F

dimaqq commented 2 months ago

Btw, this seems to affect all projects on .io, heres pypi:aapns

https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Faapns.readthedocs.io%2Fen%2Flatest%2F

dimaqq commented 2 months ago

CloudFlare advisees to "check your security events in your Cloudflare dashboard"

ref: https://community.cloudflare.com/t/link-preview-is-not-working/643722/2

ericholscher commented 2 months ago

Oh... I'm guessing this is an issue from the locked down configuration we have for Facebook's AI crawler which is spamming our site: https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/ -- is there a specific user agent that they are using for these requests, we can. perhaps unblock it, but their abusive behavior is pretty awful, so might not be possible.