ufs-community / uwtools

Workflow tools for use with applications with UFS and beyond
GNU Lesser General Public License v2.1
12 stars 24 forks source link

Better checking for external links #619

Closed maddenp-noaa closed 1 month ago

maddenp-noaa commented 1 month ago

Synopsis

Fixes #614 by replacing link checking using the built-in Sphinx link checker, which operates on the RST files, with a more capable external link checker that operates on the generated HTML files. This results in raw HTML links in e.g. <a> or, pertinently, <iframe> tags being checked the same as links generated by Sphinx from RST code.

As a test, I introduced a typo into one of the URLs defined in docs/confg.py:

$ git diff -U0
diff --git a/docs/conf.py b/docs/conf.py
index 8e0e67f9..9cb12f18 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -68 +68 @@ extlinks = {
-    "ww3": ("https://polar.ncep.noaa.gov/waves/wavewatch/%s", "%s"),
+    "ww3": ("https://polar.ncep.noaa.gov/waves/wavewitch/%s", "%s"),

The typo is detected and the checker exits with non-zero status after displaying this error:

URL        `https://polar.ncep.noaa.gov/waves/wavewitch/'
Name       `Wave Watch III'
Parent URL file:///home/maddenp/git/uwtools/docs/build/html/sections/user_guide/cli/drivers/ww3/index.html, line 117, col 123
Real URL   https://polar.ncep.noaa.gov/waves/wavewitch/
Check time 0.944 seconds
Size       214B
Result     Error: 404 Not Found

I also updated a couple Makefiles to make generation of CLI example output less noisy, reducing the volume of output from make docs. I haven't found that additional output useful in the past.

Finally, and unfortunately, note that this only partially helps with the core issue reported in #614 -- that links to YouTube videos were not being checked. They will be checked now, but not every incorrect URL is considered an error by YouTube. For example:

$ curl -s --head https://www.youtube.com/embed/foo | grep ^HTTP
HTTP/2 200

Here, although no video named foo is available, youtube.com returns an HTTP 200 success code. Open that link in your browser and you'll see that it returns content.

On the other hand, a typo in an earlier part of a YouTube URL does result in an error:

$ curl -s --head https://www.youtube.com/embedZZZ/foo | grep ^HTTP
HTTP/2 404 

This is down to YouTube's API design: They have decided not to send an HTTP error response even if a requested video does not exist.

Nevertheless, I think the changes in the PR are helpful and worth moving forward with.

Type

Impact

Checklist