Closed qrkourier closed 5 months ago
Do you think this is duplicate of #144 or #254?
Thank you for helping me find those closed issues.
In summary, it's expected that muffet
cannot currently parse links that target a destination appearing only in rendered client-side Javascript, only destinations appearing in HTML.
One solution is to tell muffet
to stop checking URL "fragments" a.k.a. "anchors", e.g., the #intro
part of
this URL https://www.example.com/welcome#intro
. (reference). Setting muffet --ignore-fragments
causes results in checking that the fragment's parent page exists, but not, for example, that the particular heading's "id" property is valid (<a href id="intro">
). This applies to all the crawled links.
Another solution is to set an exclude pattern that causes muffet
to ignore the entire URL when it belongs to a particular domain, e.g. --exclude='(https?://github\.com/.*#')
. With this pattern , fragments are checked for all other sites, and github.com links are not checked at all if they contain the fragment prefix #
.
github.com UI now has tabs associated with URL query params like
?tab=readme-ov-file
and links like thishttps://github.com/openziti/edge-api?tab=readme-ov-file#user-content-versioning
now fail because github.com moved the content from HTML to a userscript, so muffet can't "see" the target fragment in the HTML and thinks it's broken.I'll certainly have to stop checking GitHub fragments for now. I'm unsure how to check such links moving forward.