sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.48k stars 2.11k forks source link

Option to ignore `linkcheck` for SSL error #12985

Open fmigneault opened 3 days ago

fmigneault commented 3 days ago

Is your feature request related to a problem? Please describe.

Whenever a website causes SSL error because their certificate expired, linkcheck causes many CI to fail. This causes unnecessary noise and maintenance burden, given that those errors are often fixed after a small time, when the link host/maintainer fixes/renew their certificate.

Describe the solution you'd like

There should be an option to allow ignoring certain websites for SSL error (either globally, on a per-host, or regex basis). When enabled, an SSL error would retry the request using HTTP. If it still fails, then broken should be reported as it currently does. Otherwise, the link is silently permitted.

Describe alternatives you've considered

The workaround is either to ignore the broken site link (entirely!) or to switch the link back to HTTP instead of HTTPS. The first workaround could potentially mask and actual invalid link error at a later time. The second is not as much of a drastic issue, but causes documentation to redirect users to unprotected sites. Typically, browsers are able to handle SSL error and still display the HTTPS site content, so HTTPS links should be preferred (with the hope that SSL certificate would be fixed in a near future).

Additional context

Related but different:

AA-Turner commented 3 days ago

cc @jayaddison

The relevant lines are:

https://github.com/sphinx-doc/sphinx/blob/cc1b13ccbfaa3c6818839f66d1bb0627203bd204/sphinx/builders/linkcheck.py#L544-L547

A

jayaddison commented 3 days ago

Seems pretty reasonable, yep. We have an existing precedent for reporting links with a status code of ignored when HTTP 5xx responses are encountered; I'd suggest we use that same status code here.

I'd tend to anticipate that users would want to enable this selectively for sites that they learn, over time, have a tendency to fail intermittently -- so I'd lean towards suggesting a list of regexes to opt-in sites/URI-patterns for which SSL errors are ignored.

Another question is whether to narrow this to specific types of SSL error - @fmigneault is your use-case specifically related to certificate expiry, or would you want other SSL errors to be ignored by this too?

jayaddison commented 3 days ago

Ah, sorry; I read too quickly (as usual). I see that the desired behaviour would be a plaintext HTTP retry on failure.

I'm going to think about that for a little while - it's not always guaranteed of course that a site may serve an HTTP equivalent of an HTTPS URI; but it doesn't seem totally unreasonable as a fallback behaviour, either.

AA-Turner commented 3 days ago

I'm hesitant to support retrying with HTTP, it seems to go against current web best practices, eg HSTS.

If we want to implement this request it should simply report that link X was ignored as a transient failure (if so configured).

A

fmigneault commented 3 days ago

@jayaddison I am only assuming it is certificate expiry in the case I encountered because the referenced link is not managed by myself, but was working a few days ago. The website is still responding and everything. It is only the SSL check that fails. I have also opened an issue on their tracker (https://github.com/cogeotiff/www.cogeo.org/issues/72), but no answer yet.

I'm open to alternatives if you think there is a better way. In my case, I do not think there is any issue in doing HTTP. It's not like I'm sending critical data over that link. Just doing a GET to validate it is alive/resolved. However, I can understand that if auth is enabled, then this HTTP retry should probably not be done.

The ignore rather than retry works for me also. It is similar to ignoring it explicitly "temporarily". Once the SSL certificate is updated, it should be "back on" automatically in the link checks. Better than me forgetting to re-enable it and making that "temporarily" a permanent ignore.