wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

LinkChecker internal error #663

Closed martindholmes closed 6 years ago

martindholmes commented 8 years ago

Linkchecker fell over checking a specific URL today, and asked me to raise an issue. It was running a large-scale URL check against a transient build of a website on a Jenkins CI server. We do this by compiling all the relevant URLs into a single file (attached below), which is then checked. linkcheck.htm.zip

This is the relevant output from the check process, which contains the URL it failed on:

********** Oops, I did it again. *************
     [exec] 
     [exec] You have found an internal error in LinkChecker. Please write a bug report
     [exec] at https://github.com/wummel/linkchecker/issues
     [exec] and include the following information:
     [exec] - the URL or file you are testing
     [exec] - the system information below
     [exec] 
     [exec] When using the commandline client:
     [exec] - your commandline arguments and any custom configuration files.
     [exec] - the output of a debug run with option "-Dall"
     [exec] 
     [exec] Not disclosing some of the information above due to privacy reasons is ok.
     [exec] I will try to help you nonetheless, but you have to give me something
     [exec] I can work with ;) .
     [exec] 
     [exec] Traceback (most recent call last):
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/director/task.py", line 29, in run
     [exec]     line: self.run_checked()
     [exec]     locals:
     [exec]       self = <local> <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>
     [exec]       self.run_checked = <local> <bound method Checker.run_checked of <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>>
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 48, in run_checked
     [exec]     line: self.check_url()
     [exec]     locals:
     [exec]       self = <local> <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>
     [exec]       self.check_url = <local> <bound method Checker.check_url of <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>>
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 56, in check_url
     [exec]     line: self.check_url_data(url_data)
     [exec]     locals:
     [exec]       self = <local> <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>
     [exec]       self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/, started 140049194403584)>>
     [exec]       url_data = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 71, in check_url_data
     [exec]     line: url_data.check()
     [exec]     locals:
     [exec]       url_data = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       url_data.check = <local> <bound method HttpUrl.check of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, colum...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 489, in check
     [exec]     line: self.local_check()
     [exec]     locals:
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self.local_check = <local> <bound method HttpUrl.local_check of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866,...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 544, in local_check
     [exec]     line: self.check_content()
     [exec]     locals:
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self.check_content = <local> <bound method HttpUrl.check_content of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=86...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 772, in check_content
     [exec]     line: self.set_title_from_content()
     [exec]     locals:
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self.set_title_from_content = <local> <bound method HttpUrl.set_title_from_content of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 263, in set_title_from_content
     [exec]     line: parser.feed(self.get_content())
     [exec]     locals:
     [exec]       parser = <local> <linkcheck.HtmlParser.htmlsax.parser object at 0x7f5fbcde7418>
     [exec]       parser.feed = <local> <built-in method feed of linkcheck.HtmlParser.htmlsax.parser object at 0x7f5fbcde7418>
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self.get_content = <local> <bound method HttpUrl.get_content of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866,...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 751, in get_content
     [exec]     line: self.data, self.dlsize = self.read_content()
     [exec]     locals:
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self.data = <local> None
     [exec]       self.dlsize = <local> 20
     [exec]       self.read_content = <local> <bound method HttpUrl.read_content of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 686, in read_content
     [exec]     line: return self._read_content()
     [exec]     locals:
     [exec]       self = <local> <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=866, column=4, name=u'social_media', anch...
     [exec]       self._read_content = <local> <bound method HttpUrl._read_content of <http link, base_url=u'http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/', parent_url=u'file:///var/lib/jenkins/jobs/MoEML/workspace/utilities/tempLinkchecker/linkcheck.htm', base_ref=None, recursion_level=1, url_connection=None, line=86...
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/checker/httpurl.py", line 709, in _read_content
     [exec]     line: data = f.read()
     [exec]     locals:
     [exec]       data = <local> '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xcc\x19kO\xe3\xb8\xf63\x95\xf6?\xb8Y-I\x96<\x9a\x16(\xd3\x12\x10S\x18itggF\x03h\xb5\x02\x84\xdc\xc4I\xcc&q&v)\x15\xf4\xbf\xdf\xe3$m\xd3\xd2\xe1\xb1#\xee^T\xb5~\x1c\x9f\xb7\xcf\xc3\xec7\x8f\xbf\x0c\xce\xfe\xfaz\x82"\x91\xc4\x07\x8d}\xf9\x83\xa8 \t\xf7XF\\e..., len = 26680
     [exec]       f = <local> <gzip on 0x7f5fbbe85b00>
     [exec]       f.read = <local> <bound method GzipFile.read of <gzip on 0x7f5fbbe85b00>>
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/gzip2.py", line 247, in read
     [exec]     line: self._read(readsize)
     [exec]     locals:
     [exec]       self = <local> <gzip on 0x7f5fbbe85b00>
     [exec]       self._read = <local> <bound method GzipFile._read of <gzip on 0x7f5fbbe85b00>>
     [exec]       readsize = <local> 16384
     [exec]   File "/usr/lib/python2.7/dist-packages/linkcheck/gzip2.py", line 306, in _read
     [exec]     line: uncompress = self.decompress.decompress(buf)
     [exec]     locals:
     [exec]       uncompress = <not found>
     [exec]       self = <local> <gzip on 0x7f5fbbe85b00>
     [exec]       self.decompress = <local> <zlib.Decompress object at 0x7f60060a1f08>
     [exec]       self.decompress.decompress = <local> <built-in method decompress of zlib.Decompress object at 0x7f60060a1f08>
     [exec]       buf = <local> '\x7f\x1d\xd8\xe9\xef\x1f\xeb\xb5\xfbL\xfa\xa0\xffqw.M\t\xc3@\x1c\xff*\xf5\xe6\t\xd2d\x934=\xf9b\xc6\xf1\x05*\x1e<9\xa0\x88\x0f\x04\xa50\x8a\x1f\xd7Ob\xd36\x90\x82d\x9a"\xd52\\(d\xf6\x90\xe9\xee/\xbb\xf9g\x13\xd5\x99\xd4*\xad\x12}\xd42)\xd2<Kt\xcaEVz\x13>\x19\x99\xac\xfe\xca\xc4\x8cU\x1a\xf0\x81j..., len = 11310
     [exec] error: Error -3 while decompressing: invalid block type
     [exec] System info:
     [exec] LinkChecker 8.6
     [exec] Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
     [exec] [GCC 4.8.2] on linux2
     [exec] Modules: Sqlite
     [exec] Local time: 2016-06-20 16:26:35-007
     [exec] sys.argv: ['/usr/bin/linkchecker', 'utilities/tempLinkchecker/linkcheck.htm', '--quiet', '--no-follow-url=.*', '--file-output=xml/products/linkchecker-out.xml']
     [exec] LANGUAGE = 'en_CA:en'
     [exec] LANG = 'en_CA.UTF-8'
     [exec] Default locale: ('en', 'UTF-8')
     [exec] 
     [exec]  ******** LinkChecker internal error, over and out ********
     [exec] WARNING CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/ internal error occurred
     [exec] WARNING CheckThread-http://www.plannedobsolescence.net/blog/if-you-cant-say-anything-nice/ internal error occurred
dpalic commented 6 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists

martindholmes commented 6 years ago

I'll close it then. I hope the new project will continue to maintain Deb packages! Thanks for taking over.