Closed corneliusroemer closed 2 years ago
In case it helps, there may be a way to programmatically find the broken links:
bundle exec htmlproofer
- link to more infosphinx-build -b linkcheck
- link to more infoIt's been a couple years since I looked at automagically link-checking Jekyll pages, and I think we've since moved to Sphinx.
@j23414 Can you please advice on how to integrate the linkcheck
to the current build system?
https://github.com/nextstrain/nextclade/blob/18aa4e308c4dc566b71c85097f4ee0cada28f8b2/docs/Makefile#L6
Sure, I'm willing to explore! I'll try building docs locally using that makefile. Hopefully it's only SPHINXOPT='-b linkcheck'
and somehow capture the list of broken links.
Hmm, I wouldn't put it as a github action as I wouldn't want the build to "fail" just because of a broken link. Just flag the broken link so someone can fix it later.
Hopefully it's only SPHINXOPT='-b linkcheck' and somehow capture the list of broken links.
I tried, but nothing happened. I just don't fully understand how it works.
wouldn't want the build to "fail" just because of a broken link.
Definitely not need to fail the build. Some warning in the terminal would be alright.
Ah, here's the local build:
git clone https://github.com/nextstrain/nextclade.git
cd nextclade/docs
conda env create
conda activate docs.clades.nextstrain.org
# Add the link check
make -b linkcheck html &> msgs.txt
# Pull broken links, ignore any changelog messages since they should be out of date
cat msgs.txt \
| grep "broken" \
| grep -v "CHANGELOG" \
| sort
| less
which gave me:
( user/datasets: line 158) broken https://github.com/nextstrain/nextclade_data_workflows - 404 Client Error: Not Found for url: https://github.com/nextstrain/nextclade_data_workflows
(user/algorithm/01-sequence-alignment: line 5) broken nextclade-cli -
(user/algorithm/02-translation: line 5) broken ../terminology.html#gene-map -
(user/algorithm/02-translation: line 5) broken nextalign-cli -
(user/algorithm/02-translation: line 5) broken nextclade-web -
(user/algorithm/02-translation: line 13) broken ../terminology.html#peptide -
(user/algorithm/06-clade-assignment: line 3) broken ../terminology.html#clade -
(user/algorithm/06-clade-assignment: line 5) broken 05-phylogenetic-placement.html#known-limitations -
(user/algorithm/nextclade-pango: line 12) broken https://academic.oup.com/ve/article/7/2/veab064/6315289 - 403 Client Error: Forbidden for url: https://academic.oup.com/ve/article/7/2/veab064/6315289
(user/algorithm/nextclade-pango: line 40) broken https://academic.oup.com/mbe/article/37/5/1530/5721363 - 403 Client Error: Forbidden for url: https://academic.oup.com/mbe/article/37/5/1530/5721363
(user/algorithm/nextclade-pango: line 46) broken https://academic.oup.com/ve/article/4/1/vex042/4794731 - 403 Client Error: Forbidden for url: https://academic.oup.com/ve/article/4/1/vex042/4794731
(user/input-files: line 7) broken terminology.html#query-sequence -
(user/nextclade-cli: line 157) broken (https://anaconda.org/bioconda/nextclade) -
(user/output-files: line 7) broken ../_images/web_download-options.png -
(user/output-files: line 368) broken terminology.html#reference-tree-concept -
(user/terminology: line 125) broken algorithm#alignment -
(user/terminology: line 145) broken algorithm#phylogenetic-placement -
Summarized fixes here: 👈
The rest of the flagged links worked fine, and may be some issues with linkcheck failing to follow an anchor link or some kinda of user agent problem.
Matthijs Welkers kindly reported that our docs still contain links to Github release assets based on v1 binary names.
It would be great if we could sift through docs and replace things
Example: