spring-projects / spring-boot

Spring Boot helps you to create Spring-powered, production-grade applications and services with absolute minimum fuss.
https://spring.io/projects/spring-boot
Apache License 2.0
75.03k stars 40.66k forks source link

Automate link checking for generated documentation #21035

Open scottfrederick opened 4 years ago

scottfrederick commented 4 years ago

There have been a few issues lately with broken links in docs (e.g. #20818, #21019). We should investigate automated checking of links in generated or published docs.

dreis2211 commented 4 years ago

I used http://validator.w3.org/checklink today to find hopefully one last issue. Maybe you can take a look at that for inspiration and/or validation of results.

As you can see there are some URLs which are generally reachable via Browser, but not if you simply curl against them, e.g.:

curl -I https://redis.io
HTTP/1.1 404 Not Found
Server: nginx/1.10.2
Date: Wed, 22 Apr 2020 16:50:24 GMT
Content-Length: 3673
Connection: keep-alive
wilkinsona commented 4 years ago

Interesting finding. Thanks, @dreis2211. Looks like redis.io changes behaviour based on the User-Agent. It 404s for HEAD requests too. It responds with a 200 for GET or HEAD if you spoof the user agent and pretend to be a browser.

dreis2211 commented 4 years ago

Yeah, there are some which are also behind HTTP Basic auth. My point being: Some URLs are false positives and require a bit more magic than others for an eventual tooling.

danielmenezesbr commented 4 years ago

Istio.io [1] [2] uses html-proofer and linkinator to test rendered HTML files to make sure they're accurate.