wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Use configured timeout value when requesting robots.txt files. #634

Open chuckbjones opened 8 years ago

chuckbjones commented 8 years ago

When a remote server is not responding, we have 2 issues.

1) The request for robots.txt takes a long time to fail. 2) When it does fail, requests.exceptions.Timeout is not thrown. Instead we get a requests.exceptions.RequestException, which does not abort the link check. So we still have to wait for the link check to timeout before moving on.

This patch uses the configured timeout value for robots.txt requests, which will throw requests.exceptions.Timeout and abort the current link check if the remote host does not respond after the timeout .

anarcat commented 7 years ago

please reroll this PR in the new organisation here: https://github.com/linkcheck/linkchecker/pulls see #686 for details