readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
7.94k stars 3.58k forks source link

ReadTheDocs hosted website marked as potentially hacked by Google #11450

Closed rouault closed 6 days ago

rouault commented 1 week ago

The documentation of the PROJ (https://github.com/OSGeo/PROJ) project is hosted by Read The Docs, with a CNAME record from the proj.org domain the PROJ project owns to readthedocs.io. We recently got a report in https://github.com/OSGeo/PROJ/issues/4182 that Google has tagged the proj.org as potentially hacked when it appears in the result of a search. Digging with the Google Search console, the "Security Issues" sections says:

Hacked: URL injection

Description
These pages appear to be created by a hacker in order to spam search results.
 [Learn more](https://support.google.com/webmasters/answer/9044101#hacked-url-injection)

Sample URLs
17 May 2024https://www.proj.org/tews/video-barh-sa7ad_1.html
20 May 2024https://www.proj.org/mex/video-zmk-vs-borkan6.html
16 May 2024https://www.proj.org/tews/video-barh-sa7ad_19.html
25 May 2024https://www.proj.org/mex/video-zmk-vs-borkan15.html?jaja
1 Jul 2024https://www.proj.org/tews/video-barh-sa7ad_17.html
21 May 2024https://www.proj.org/mex/video-zmk-vs-borkan12.html
20 May 2024https://www.proj.org/mex/video-zmk-vs-borkan11.html
1 Jul 2024https://www.proj.org/mex/video-zmk-vs-borkan9.html
19 May 2024https://www.proj.org/mex/video-zmk-vs-borkan4.html
17 May 2024https://www.proj.org/tews/video-barh-sa7ad_14.html
16 May 2024https://www.proj.org/tews/video-zamalek-berkane_11.html
2 Jun 2024https://www.proj.org/mex/video-zmk-vs-borkan4.html?jaja

We are puzzled by this because such tews/video-xxxx or mex/video-xxxx pages are definitely not part of our Sphinx sources. Using github all-repository search, we found that a totally unrelated repository has a page https://github.com/kikosaad2024/adhd/blob/b2cf9c1bcb6be4b50ecc28e49daa1d23a6e903cd/news/tews/gaaa5.xml where such links are mentioned. We are not sure if it is just a coincidence, but that repository is also handled by ReadTheDocs: https://github.com/kikosaad2024/adhd/blob/main/.readthedocs.yaml

Is it possible that there is some form of unwanted interaction between 2 ReadTheDocs hosted websites?

rhuijben commented 1 week ago

These errors are explicitly on www..proj.org, a host where I currently get a low level ssl error from in my browserr.

Looking in the DNS this host should also be at RTD, but it is not handled like proj.org.

If I ignore all errors, I come to a Cloudflare error that this hostname is not handled.

Assuming RTD doesn't do the Cloudflare wrapping, I'm not sure if the issue should be fixed here. The Cloudflare config appears a more logical location.

hobu commented 1 week ago

I have added www to our rtd config. I wonder if this changes the situation...

image
humitos commented 1 week ago

Hi all 👋🏼 . Thanks for opening this issue. I've read it all and I'm a little confused about what's the exact problem here. I checked the custom domain configuration and the DNS records and everything looks fine from the Read the Docs side.

I seems that URLs like https://www.proj.org/tews/video-barh-sa7ad_19.html weren't served by Read the Docs before adding www.proj.org on your Read the Docs project. Now, hitting that URLs returns 404. I suppose there was a miss configuration at DNS level where www.proj.org may pointed to a different host?

Let us know if you have more context around this issue and if you suspect there is something in particular in our platform that may be wrong.

jjimenezshaw commented 1 week ago

I'm confused

The DNS entries for www.proj.org are pointing to rtd, right?

https://www.nslookup.io/domains/www.proj.org/dns-records/

I don't know if having both A and CNAME is a problem.

But I would never expect that different pages are provided as Google was pointing

rouault commented 6 days ago

Closing this issue. It is not clear what went wrong originally, and what has caused it to be solved, but Google no longer marks proj.org as being potentially hacked.