Closed aj-stein-nist closed 2 years ago
Ok, @david-waltermire-nist here is the status report as I am at a juncture where I need your critical feedback and input. 😅
lychee
is better and close to done for website contentmarkdown-link-check
. I reviewed our goals and AC, and some challenges we have:
markdown-link-check
and the GHA mlc action have no native reporting format, only output to console.lychee
and markdown-link-check
can be used as a library (nice work from the lychee devs with a feature comparison matrix, but the core of the latter doesn't have any reporting capability but a callback mechanism.markdown-link-check
and/or the actionlychee
lychee
locally with xargs on only the markdown files (git ls-files "*/*.md" -z | grep --null-data -v "^docs/" | xargs -0 lychee --exclude-file ./build/config/.lycheeignore --verbose --accept 200,206,429 --no-progress
) and the only issue was ironically something MLC didn't catch before in build/README.md
with templated strings like https://github.com/ndw/xmlcalabash1/releases/download/$%7BCALABASH_VERSION%7D/xmlcalabash-$%7BCALABASH_VERSION%7D.zip
and https://github.com/gohugoio/hugo/releases/download/v$%7BHUGO_VERSION%7D/hugo_extended_$%7BHUGO_VERSION%7D_Linux-64bit.deb
.Wasn't there a reason we didn't want to use lychee
for Markdown content? I spent some time noodling on that but I cannot remember the specific issue.
Now I remember, it is the local relative path rewriting feature we use for GitHub links to a repo's issue board, currently in the mlc
config. I guess I sync up with Dave letter about which 1/2 we use to move forward and resolve this soon (or maybe skip checking Markdown links in cron
schedule fashion altogether).
I also figured out a faster turnaround workaround by just removing the -q
argument from markdown-link-check
and taking that as piped stream output from the command-line execution in a labelled output variable (StackOverflow reference) for that and then insert that named output into a GitHub issue when the returned status code is not 0
.
Also, from quick sitrep with Dave, need to fix the wildcarding from git ls-files "*/*.md"
to git ls-files "*.md"
to properly catch top-level README, which we are missing. Also need to add the "pattern": "^#.*"
to ensure we do not hit bare anchor links in that README or it will bomb, and we cannot really count on those validating at PR time because of how GitHub dynamically generates those anchors isn't always reflected in the local copies in a 1:1 not false positive inducing way. :-)
User Story:
As an OSCAL tool developer, in order to know all internal and external hyperlinks are valid over time and not only when specific developers make modifications that are sometimes not related to modified links, I want the OSCAL's CI/CD automation to periodically examine links for the OSCAL website and repo Markdown documentation for broken links on a schedule. If a broken link is found, I would like a new issue to be automatically opened indicating which link should be subsequently handled by an available developer.
Goals:
Improve the OSCAL CI/CD system so that broken links can be detected outside of a developer code/test/push cycle, which often might not be related to doc and website improvement.
Dependencies:
Complete after usnistgov/OSCAL#1208 is complete.
Acceptance Criteria
markdown-link-check
action for Markdown docs outside the website content.lychee-action
action for OSCAL website content link checks.