usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
670 stars 181 forks source link

Scheduled Link Checks of OSCAL Repo's Main Branch and Automatically Open Issues for Broken Links #1230

Closed aj-stein-nist closed 2 years ago

aj-stein-nist commented 2 years ago

User Story:

As an OSCAL tool developer, in order to know all internal and external hyperlinks are valid over time and not only when specific developers make modifications that are sometimes not related to modified links, I want the OSCAL's CI/CD automation to periodically examine links for the OSCAL website and repo Markdown documentation for broken links on a schedule. If a broken link is found, I would like a new issue to be automatically opened indicating which link should be subsequently handled by an available developer.

Goals:

Improve the OSCAL CI/CD system so that broken links can be detected outside of a developer code/test/push cycle, which often might not be related to doc and website improvement.

Dependencies:

Complete after usnistgov/OSCAL#1208 is complete.

Acceptance Criteria

aj-stein-nist commented 2 years ago

Ok, @david-waltermire-nist here is the status report as I am at a juncture where I need your critical feedback and input. 😅

Wasn't there a reason we didn't want to use lychee for Markdown content? I spent some time noodling on that but I cannot remember the specific issue.

aj-stein-nist commented 2 years ago

Now I remember, it is the local relative path rewriting feature we use for GitHub links to a repo's issue board, currently in the mlc config. I guess I sync up with Dave letter about which 1/2 we use to move forward and resolve this soon (or maybe skip checking Markdown links in cron schedule fashion altogether).

aj-stein-nist commented 2 years ago

I also figured out a faster turnaround workaround by just removing the -q argument from markdown-link-check and taking that as piped stream output from the command-line execution in a labelled output variable (StackOverflow reference) for that and then insert that named output into a GitHub issue when the returned status code is not 0.

Also, from quick sitrep with Dave, need to fix the wildcarding from git ls-files "*/*.md" to git ls-files "*.md" to properly catch top-level README, which we are missing. Also need to add the "pattern": "^#.*" to ensure we do not hit bare anchor links in that README or it will bomb, and we cannot really count on those validating at PR time because of how GitHub dynamically generates those anchors isn't always reflected in the local copies in a 1:1 not false positive inducing way. :-)