robots.txt - Githubissues

w3c / link-checker

Check links and anchors in Web pages or full Web sites.

https://validator.w3.org/checklink

118 stars 38 forks source link

robots.txt #48

Open dr-norton opened 3 years ago

dr-norton commented 3 years ago

Validator's own robots.txt filters cause errors for link checker itself. Here's the link: https://validator.w3.org/robots.txt

For example, a webpage with a link like: https://validator.w3.org/checklink?uri=xyz.com

Produces this error: Status: (N/A) Forbidden by robots.txt. The link was not checked due to robots exclusion rules. Check the link manually.

And, to solve this issue, the link checker gives you an advice there: https://validator.w3.org/checklink/docs/checklink.html#bot

dontcallmedom commented 3 years ago

this is on purpose, to avoid creating infinite loops. does it cause particular issues?

dr-norton commented 3 years ago

no, actually. it's just more errors on the result page, not a big deal for me, personally. but the whole situation is like a contraption. and sounds like it could be solved easily. say that, checker could check the validator pages as if they had no parameters, or, it could check a special page, or, something else. getting errors and ignoring them... could be the last item on the list.