wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Opening bug report as requested #614

Closed nigelhorne closed 6 years ago

nigelhorne commented 8 years ago

$ linkchecker http://www.concert-bands.co.uk INFO 2015-09-22 08:54:39,917 MainThread Checking intern URLs only; use --check-extern to check extern URLs. LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://wummel.github.io/linkchecker/ Write comments and bugs to https://github.com/wummel/linkchecker/issues Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2015-09-22 08:54:39-004

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-http://www.concert-bands.co.uk, started 139690331641600)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-http://www.concert-bands.co.uk, started 139690331641600)>> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x7f0c3192e230> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self = <Checker(CheckThread-http://www.concert-bands.co.uk, started 139690331641600)> self.logger = <linkcheck.director.logger.Logger object at 0x7f0c30da0ed0> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> url_data.check = <bound method HttpUrl.check of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.local_check = <bound method HttpUrl.local_check of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.check_connection = <bound method HttpUrl.check_connection of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 128, in check_connection line: if not self.allows_robots(self.url): locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.allows_robots = <bound method HttpUrl.allows_robots of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> self.url = u'http://www.concert-bands.co.uk', len = 30 File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 66, in allows_robots line: return self.aggregate.robots_txt.allows_url(self) locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.aggregate = <linkcheck.director.aggregator.Aggregate object at 0x7f0c30da0e50> self.aggregate.robots_txt = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f0c30da0d50> self.aggregate.robots_txt.allows_url = <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f0c30da0d50>> File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 49, in allows_url line: return self._allows_url(url_data, roboturl) locals: self = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f0c30da0d50> self._allows_url = <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f0c30da0d50>> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> roboturl = u'http://www.concert-bands.co.uk/robots.txt', len = 41 File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 62, in _allows_url line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy} locals: kwargs = {'session': <requests.sessions.Session object at 0x7f0c30dc0150>, 'auth': None} url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> url_data.proxy_type = !AttributeError: 'HttpUrl' object has no attribute 'proxy_type' url_data.proxy = '192.168.1.2:3128', len = 16 AttributeError: 'HttpUrl' object has no attribute 'proxy_type' System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.10 (default, Sep 13 2015, 20:30:50) [GCC 5.2.1 20150911] on linux2 Requests: 2.7.0 Qt: 4.8.7 / PyQt: 4.11.4

Modules: QScintilla, Argcomplete, GeoIP, Sqlite, Gconf, MeliaeStatistics: Downloaded: 0B. No statistics available since no URLs were checked.

That's it. 0 links Local time: in 0 URLs checked. 0 warnings found 2015-09-22 08:54:40-004 sys.argv: . ['/usr/bin/linkchecker', 'http://www.concert-bands.co.uk'] 0 errors foundhttp_proxy =. Stopped checking at 2015-09-22 08:54:40-004 (0.37 seconds) 'http://192.168.1.2:3128' ftp_proxy = 'http://192.168.1.2:3128' no_proxy = 'localhost,127.0.0.0/8,192.168.1.0/24,utilite' LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

\ LinkChecker internal error, over and out ** WARNING 2015-09-22 08:54:40,295 CheckThread-http://www.concert-bands.co.uk internal error occurred

nigelhorne commented 8 years ago

$ linkchecker -Dall http://www.concert-bands.co.uk DEBUG 2015-09-22 08:58:44,613 MainThread Python 2.7.10 (default, Sep 13 2015, 20:30:50) [GCC 5.2.1 20150911] on linux2 DEBUG 2015-09-22 08:58:44,613 MainThread reading configuration from ['/home/njh/.linkchecker/linkcheckerrc'] INFO 2015-09-22 08:58:44,617 MainThread Checking intern URLs only; use --check-extern to check extern URLs. DEBUG 2015-09-22 08:58:44,619 MainThread configuration: [('aborttimeout', 300), ('allowedschemes', []), ('authentication', []), ('blacklist', {}), ('checkextern', False), ('cookiefile', None), ('csv', {}), ('debugmemory', False), ('dot', {}), ('enabledplugins', []), ('externlinks', []), ('fileoutput', []), ('gml', {}), ('gxml', {}), ('html', {}), ('ignorewarnings', []), ('internlinks', []), ('localwebroot', None), ('logger', 'TextLogger'), ('loginextrafields', {}), ('loginpasswordfield', 'password'), ('loginurl', None), ('loginuserfield', 'login'), ('maxfilesizedownload', 5242880), ('maxfilesizeparse', 1048576), ('maxhttpredirects', 10), ('maxnumurls', None), ('maxrequestspersecond', 10), ('maxrunseconds', None), ('nntpserver', None), ('none', {}), ('output', 'text'), ('pluginfolders', []), ('proxy', {'ftp': 'http://192.168.1.2:3128', 'http': 'http://192.168.1.2:3128', 'https': 'http://192.168.1.2:3128', 'no': 'localhost,127.0.0.0/8,192.168.1.0/8'}), ('quiet', False), ('recursionlevel', -1), ('sitemap', {}), ('sql', {}), ('sslverify', True), ('status', True), ('status_wait_seconds', 5), ('text', {}), ('threads', 10), ('timeout', 60), ('trace', False), ('useragent', u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'), ('verbose', False), ('warnings', True), ('xml', {})] DEBUG 2015-09-22 08:58:44,620 MainThread HttpUrl handles url http://www.concert-bands.co.uk DEBUG 2015-09-22 08:58:44,620 MainThread checking syntax DEBUG 2015-09-22 08:58:44,620 MainThread Add intern pattern u'^https?://(www.|)concert-bands.co.uk' DEBUG 2015-09-22 08:58:44,620 MainThread Link pattern u'^https?://(www.|)concert-bands.co.uk' strict=False DEBUG 2015-09-22 08:58:44,621 MainThread queueing http://www.concert-bands.co.uk LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://wummel.github.io/linkchecker/ Write comments and bugs to https://github.com/wummel/linkchecker/issues Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2015-09-22 08:58:44-004 DEBUG 2015-09-22 08:58:44,624 CheckThread-http://www.concert-bands.co.uk Checking http link base_url=u'http://www.concert-bands.co.uk' parent_url=None base_ref=None recursion_level=0 url_connection=None line=0 column=0 page=0 name=u'' anchor=u'' cache_url=http://www.concert-bands.co.uk DEBUG 2015-09-22 08:58:44,625 CheckThread-http://www.concert-bands.co.uk checking connection DEBUG 2015-09-22 08:58:44,627 CheckThread-http://www.concert-bands.co.uk using proxy '192.168.1.2:3128' DEBUG 2015-09-22 08:58:44,627 CheckThread-http://www.concert-bands.co.uk task_done http://www.concert-bands.co.uk

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-http://www.concert-bands.co.uk, started 140078895724288)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-http://www.concert-bands.co.uk, started 140078895724288)>> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x7f66aa0c5230> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self = <Checker(CheckThread-http://www.concert-bands.co.uk, started 140078895724288)> self.logger = <linkcheck.director.logger.Logger object at 0x7f66a953bf50> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> url_data.check = <bound method HttpUrl.check of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.local_check = <bound method HttpUrl.local_check of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.check_connection = <bound method HttpUrl.check_connection of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 128, in check_connection line: if not self.allows_robots(self.url): locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.allows_robots = <bound method HttpUrl.allows_robots of <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk>> self.url = u'http://www.concert-bands.co.uk', len = 30 File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 66, in allows_robots line: return self.aggregate.robots_txt.allows_url(self) locals: self = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> self.aggregate = <linkcheck.director.aggregator.Aggregate object at 0x7f66a953bed0> self.aggregate.robots_txt = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f66a953be50> self.aggregate.robots_txt.allows_url = <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f66a953be50>> File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 49, in allows_url line: return self._allows_url(url_data, roboturl) locals: self = <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f66a953be50> self._allows_url = <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7f66a953be50>> url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> roboturl = u'http://www.concert-bands.co.uk/robots.txt', len = 41 File "/usr/lib/python2.7/dist-packages/linkcheck/cache/robots_txt.py", line 62, in _allows_url line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy} locals: kwargs = {'session': <requests.sessions.Session object at 0x7f66a9561ad0>, 'auth': None} url_data = <http link, base_url=u'http://www.concert-bands.co.uk', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://www.concert-bands.co.uk> url_data.proxy_type = !AttributeError: 'HttpUrl' object has no attribute 'proxy_type' url_data.proxy = '192.168.1.2:3128', len = 16 AttributeError: 'HttpUrl' object has no attribute 'proxy_type' System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.10 (default, Sep 13 2015, 20:30:50) [GCC 5.2.1 20150911] on linux2

Statistics: Requests: 2.7.0Downloaded: 0B.

No statistics available since no URLs were checked.

Qt: 4.8.7 / PyQt: 4.11.4 Modules: QScintilla, Argcomplete, GeoIP, Sqlite, Gconf, Meliae Local time:That's it. 0 links 2015-09-22 08:58:44-004 in 0 URLs checked. sys.argv: 0 warnings found['/usr/bin/linkchecker', '-Dall', 'http://www.concert-bands.co.uk']. 0 errors found http_proxy . Stopped checking at 2015-09-22 08:58:44-004 (0.04 seconds) = 'http://192.168.1.2:3128' ftp_proxy = 'http://192.168.1.2:3128' no_proxy = 'localhost,127.0.0.0/8,192.168.1.0/24,utilite' LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

\ LinkChecker internal error, over and out ** WARNING 2015-09-22 08:58:44,663 CheckThread-http://www.concert-bands.co.uk internal error occurred

dpalic commented 6 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists