kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.53k stars 879 forks source link

Expand robots.txt for Kedro-Viz and Kedro-Datasets docs #3729

Closed DimedS closed 4 months ago

DimedS commented 4 months ago

This PR updates the robots.txt file to include explicit allowances for the indexing of the Kedro-Viz and Kedro-Datasets documentation sections by search engines.

Changes Made:

Previously, only the kedro project documentation was allowed for indexing, as specified in our robots.txt. By updating this file, we making it easier for users to find information about Kedro-Viz and Kedro-Datasets directly via search engines.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

DimedS commented 4 months ago

Is this ready to be reviewed? How should this be tested.

https://www.google.com/search?q=kedro+docs+configloader&oq=kedro+docs+configloader&gs_lcrp=EgZjaHJvbWUyCQgAEEUYORifBTIGCAEQRRg8MgkIAhAhGAoYoAEyCQgDECEYChigATIJCAQQIRgKGKAB0gEIMzk0NGowajmoAgCwAgA&client=ms-android-google&sourceid=chrome-mobile&ie=UTF-8

I see 0.18.1 docs on Google search so I suspect something isn't working properly.

Yes, it's ready for review. Overall, it seems to function properly, though we are experiencing some issues specifically with Google—I'm not sure of the exact cause. However, it's mostly fine with other search engines; please see the screenshots from Yahoo, Yandex, and DuckDuckGo for reference. I believe it's impossible to test it before merging, as search engines will need some time to recognize the updated robots.txt and reflect those changes in their search results. Maybe we need to do something special with google.

Screenshot 2024-03-22 at 12 04 23 Screenshot 2024-03-22 at 12 04 33 Screenshot 2024-03-22 at 12 04 51
astrojuanlu commented 4 months ago

we are experiencing some issues specifically with Google—I'm not sure of the exact cause

Interesting, might be related to #3708 somehow?

noklam commented 3 months ago
User-agent: *
Disallow: /
Allow: /en/stable/
Allow: /en/0.19.3/
Allow: /en/0.19.2/
Allow: /en/0.19.1/
Allow: /en/0.19.0/
Allow: /en/0.18.14/
Allow: /en/0.17.7/

For the reference, the robots.txt is still not updated. https://docs.kedro.org/robots.txt

astrojuanlu commented 3 months ago

@noklam it will only get updated in the next release, I believe