Closed wummel closed 11 years ago
Submitted by calvin
Logged In: YES user_id=9205
LinkChecker is a web robot and thus follows the robots.txt access control standard (see [1]). If a site denies access to such robots, as pointed out in the warning you got, then LinkChecker does not access it.
[1] http://www.robotstxt.org/wc/exclusion.html
It is possible to ignore the robots.txt standard, but I will not do that since it would get LinkChecker added to some blacklists for bad behaviour :)
So to your problem: you cannot check sites with LinkChecker that deny access in the robots.txt file. All you can do is ask the site administrator to add LinkChecker to the allowed web robots for the site.
Converted from SourceForge issue 1323649, submitted by javahollic
I'd like to use linkchecker against a sharepoint based website, it currently gives me a 'Warning, access denied by robots.txt' and setting the same user/password in linkchecker that is required to access the site over http doesnt change things (in fact it doesnt matter if the user/password is wrong). I've tried enabling cookies etc to no effect. I enabled debug cmdline output and see the correct user and password listed...
I can list through one sharepoint server I found: http://sharepoint.bilsimser.com/pages/templates.aspx
But the one I want to test has authority enabled for the index page. SSL is not enabled.
Does linkchecker need to masquerade as a browser? can this error be ignored somehow?
Is this is a bug or am I using it it incorrectly?