wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Opening bug report as requested #620

Open TheDromundKaas opened 9 years ago

TheDromundKaas commented 9 years ago

root@j185599:~# linkchecker https://www.haneu.de -Dall DEBUG 2015-10-22 12:19:46,672 MainThread Python 2.7.9 (default, Mar 1 2015, 12:57:24) [GCC 4.9.2] on linux2 DEBUG 2015-10-22 12:19:46,672 MainThread reading configuration from ['/root/.linkchecker/linkcheckerrc'] WARNING 2015-10-22 12:19:46,684 MainThread Running as root user; dropping privileges by changing user to nobody. INFO 2015-10-22 12:19:46,685 MainThread Checking intern URLs only; use --check-extern to check extern URLs. DEBUG 2015-10-22 12:19:46,691 MainThread configuration: [('aborttimeout', 300), ('allowedschemes', []), ('authentication', []), ('blacklist', {}), ('checkextern', False), ('cookiefile', None), ('csv', {}), ('debugmemory', False), ('dot', {}), ('enabledplugins', []), ('externlinks', []), ('fileoutput', []), ('gml', {}), ('gxml', {}), ('html', {}), ('ignorewarnings', []), ('internlinks', []), ('localwebroot', None), ('logger', 'TextLogger'), ('loginextrafields', {}), ('loginpasswordfield', 'password'), ('loginurl', None), ('loginuserfield', 'login'), ('maxfilesizedownload', 5242880), ('maxfilesizeparse', 1048576), ('maxhttpredirects', 10), ('maxnumurls', None), ('maxrequestspersecond', 10), ('maxrunseconds', None), ('nntpserver', None), ('none', {}), ('output', 'text'), ('pluginfolders', []), ('proxy', {}), ('quiet', False), ('recursionlevel', -1), ('sitemap', {}), ('sql', {}), ('sslverify', True), ('status', True), ('status_wait_seconds', 5), ('text', {}), ('threads', 10), ('timeout', 60), ('trace', False), ('useragent', u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'), ('verbose', False), ('warnings', True), ('xml', {})] DEBUG 2015-10-22 12:19:46,693 MainThread HttpUrl handles url https://www.haneu.de DEBUG 2015-10-22 12:19:46,693 MainThread checking syntax DEBUG 2015-10-22 12:19:46,694 MainThread Add intern pattern u'^https?://(www.|)haneu.de' DEBUG 2015-10-22 12:19:46,695 MainThread Link pattern u'^https?://(www.|)haneu.de' strict=False DEBUG 2015-10-22 12:19:46,695 MainThread queueing https://www.haneu.de LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://wummel.github.io/linkchecker/ Write comments and bugs to https://github.com/wummel/linkchecker/issues Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2015-10-22 12:19:46+002 DEBUG 2015-10-22 12:19:46,724 CheckThread-https://www.haneu.de Checking https link base_url=u'https://www.haneu.de' parent_url=None base_ref=None recursion_level=0 url_connection=None line=0 column=0 page=0 name=u'' anchor=u'' cache_url=https://www.haneu.de DEBUG 2015-10-22 12:19:46,728 CheckThread-https://www.haneu.de checking connection DEBUG 2015-10-22 12:19:46,952 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' parse lines DEBUG 2015-10-22 12:19:46,953 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 7: allow or disallow directives without any user-agent line DEBUG 2015-10-22 12:19:46,953 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 9: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,953 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 10: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,953 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 11: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,954 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 12: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,954 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 13: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,954 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 14: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,954 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 15: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,954 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 18: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,955 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 19: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,955 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 20: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,955 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 21: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,955 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 22: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,955 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 24: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 25: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 26: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 27: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 28: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 29: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,956 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 30: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 31: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 32: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 33: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 34: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 37: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,957 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 38: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 39: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 40: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 41: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 42: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 43: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 44: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,958 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' line 50: missing user-agent directive before this line DEBUG 2015-10-22 12:19:46,959 CheckThread-https://www.haneu.de Parsed rules: User-agent: Googlebot-Image Allow: / DEBUG 2015-10-22 12:19:46,959 CheckThread-https://www.haneu.de u'https://www.haneu.de/robots.txt' check allowance for: user agent: u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)' url: u'https://www.haneu.de' ... DEBUG 2015-10-22 12:19:46,959 CheckThread-https://www.haneu.de ... agent not found, allow. DEBUG 2015-10-22 12:19:46,959 CheckThread-https://www.haneu.de Prepare request with {'method': 'GET', 'url': u'https://www.haneu.de', 'headers': {}} DEBUG 2015-10-22 12:19:46,960 CheckThread-https://www.haneu.de Send request with {'allow_redirects': False, 'timeout': 60, 'verify': True, 'stream': True} DEBUG 2015-10-22 12:19:47,007 CheckThread-https://www.haneu.de task_done https://www.haneu.de

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-https://www.haneu.de, started 140294285195008)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-https://www.haneu.de, started 140294285195008)>> url_data = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x7f98d0fc7b90> url_data = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self = <Checker(CheckThread-https://www.haneu.de, started 140294285195008)> self.logger = <linkcheck.director.logger.Logger object at 0x7f98d08dce90> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> url_data.check = <bound method HttpUrl.check of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self.local_check = <bound method HttpUrl.local_check of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self.check_connection = <bound method HttpUrl.check_connection of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 135, in check_connection line: self.send_request(request) locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self.send_request = <bound method HttpUrl.send_request of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> request = <PreparedRequest [GET]> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 165, in send_request line: self._send_request(request, **kwargs) locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self._send_request = <bound method HttpUrl._send_request of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> request = <PreparedRequest [GET]> kwargs = {'allow_redirects': False, 'timeout': 60, 'verify': True, 'stream': True} File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 172, in _send_request line: self._add_ssl_info() locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self._add_ssl_info = <bound method HttpUrl._add_ssl_info of <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 199, in _add_ssl_info line: self.ssl_cert = httputil.x509_to_dict(cert) locals: self = <https link, base_url=u'https://www.haneu.de', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://www.haneu.de> self.ssl_cert = None httputil = <module 'linkcheck.httputil' from '/usr/lib/python2.7/dist-packages/linkcheck/httputil.pyc'> httputil.x509_to_dict = <function x509_to_dict at 0x7f98d183ee60> cert = <OpenSSL.crypto.X509 object at 0x7f98ce869950> File "/usr/lib/python2.7/dist-packages/linkcheck/httputil.py", line 35, in x509_to_dict line: from requests.packages.urllib3.contrib.pyopenssl import get_subj_alt_name locals: requests = requests.packages = requests.packages.urllib3 = requests.packages.urllib3.contrib = requests.packages.urllib3.contrib.pyopenssl = get_subj_alt_name = ImportError: cannot import name get_subj_alt_name System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.9 (default, Mar 1 2015, 12:57:24) [GCC 4.9.2] on linux2 Requests: 2.4.3 Modules: Sqlite Local time: 2015-10-22 12:19:47+002 sys.argv: ['/usr/bin/linkchecker', 'https://www.haneu.de', '-Dall'] LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

\ LinkChecker internal error, over and out ** WARNING 2015-10-22 12:19:47,017 CheckThread-https://www.haneu.de internal error occurred

Statistics: Downloaded: 0B. No statistics available since no URLs were checked.

That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found. There was 1 internal error. Stopped checking at 2015-10-22 12:19:47+002 (0.35 seconds) root@j185599:~#

anarcat commented 8 years ago

this is fixed in #656

dpalic commented 7 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists