wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Internal Error found in Linkchecker #763

Closed zounp closed 5 years ago

zounp commented 5 years ago

** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-https://wiki.webevaluation.nl/doku.php?id=fusionpbx, started 140176537171712)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-https://wiki.webevaluation.nl/doku.php?id=fusionpbx, started 140176537171712)>> url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x7f7d654b55f0> url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... self = <Checker(CheckThread-https://wiki.webevaluation.nl/doku.php?id=fusionpbx, started 140176537171712)> self.logger = <linkcheck.director.logger.Logger object at 0x7f7d65397a10> File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 64, in check_url line: parser.parse_url(url_data) locals: parser = <module 'linkcheck.parser' from '/usr/lib/python2.7/dist-packages/linkcheck/parser/init.pyc'> parser.parse_url = <function parse_url at 0x7f7d654b1b90> url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 41, in parse_url line: globals()funcname locals: globals = funcname = 'parse_html', len = 10 url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 50, in parse_html line: find_links(url_data, url_data.add_url, linkparse.LinkTags) locals: find_links = <function find_links at 0x7f7d654b5050> url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... url_data.add_url = <bound method HttpUrl.add_url of <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', c... linkparse = <module 'linkcheck.htmlutil.linkparse' from '/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.pyc'> linkparse.LinkTags = {'body': [u'background'], 'bgsound': [u'src'], 'head': [u'profile'], 'blockquote': [u'cite'], 'th': [u'background'], 'form': [u'action'], 'track': [u'src'], 'frame': [u'src', u'longdesc'], 'object': [u'classid', u'data', u'archive', u'usemap', u'codebase'], 'layer': [u'background', u'src'], 'ins'..., len = 35 File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 128, in find_links line: parser.feed(url_data.get_content()) locals: parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7f7d3aa51f70> parser.feed = <built-in method feed of linkcheck.HtmlParser.htmlsax.parser object at 0x7f7d3aa51f70> url_data = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... url_data.get_content = <bound method HttpUrl.get_content of <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'... File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 231, in start_element line: self.parse_tag(tag, attr, value, name, base) locals: self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f7d3aa4aa90> self.parse_tag = <bound method LinkFinder.parse_tag of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f7d3aa4aa90>> tag = u'a' attr = u'href' value = u'https://[IP', len = 11 name = u'https://[IP', len = 11 base = u'' File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 277, in parse_tag line: self.found_url(value, name, base) locals: self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f7d3aa4aa90> self.found_url = <bound method LinkFinder.found_url of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f7d3aa4aa90>> value = u'https://[IP', len = 11 name = u'https://[IP', len = 11 base = u'' File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 283, in found_url line: column=self.parser.last_column(), name=name, base=base) locals: column = self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f7d3aa4aa90> self.parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7f7d3aa51f70> self.parser.last_column = <built-in method last_column of linkcheck.HtmlParser.htmlsax.parser object at 0x7f7d3aa51f70> name = u'https://[IP', len = 11 base = u'' File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 668, in add_url line: page=page, name=name, parent_content_type=self.content_type) locals: page = 0 name = u'https://[IP', len = 11 parent_content_type = self = <https link, base_url=u'/doku.php?id=fusionpbx', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=installing_software', base_ref=None, recursion_level=4, url_connection=None, line=138, column=1, page=0, name=u'Debian, FreeSwitch and FusionPBX', anchor=u'', cache_url=https://wiki.webevaluati... self.content_type = 'text/html', len = 9 File "/usr/lib/python2.7/dist-packages/linkcheck/checker/init.py", line 125, in get_url_from line: line=line, column=column, page=page, name=name, extern=extern) locals: line = 189 column = 49 page = 0 name = u'https://[IP', len = 11 extern = None File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 131, in init line: self.check_syntax() locals: self = <https link, base_url=u'https://[IP', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=fusionpbx', base_ref=None, recursion_level=5, url_connection=None, line=189, column=49, page=0, name=u'https://[IP', anchor=None, cache_url=None> self.check_syntax = <bound method HttpUrl.check_syntax of <https link, base_url=u'https://[IP', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=fusionpbx', base_ref=None, recursion_level=5, url_connection=None, line=189, column=49, page=0, name=u'https://[IP', anchor=None, cache_url=None>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 336, in check_syntax line: self.build_url() locals: self = <https link, base_url=u'https://[IP', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=fusionpbx', base_ref=None, recursion_level=5, url_connection=None, line=189, column=49, page=0, name=u'https://[IP', anchor=None, cache_url=None> self.build_url = <bound method HttpUrl.build_url of <https link, base_url=u'https://[IP', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=fusionpbx', base_ref=None, recursion_level=5, url_connection=None, line=189, column=49, page=0, name=u'https://[IP', anchor=None, cache_url=None>> File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 361, in build_url line: base_url, is_idn = url_norm(self.base_url, self.encoding) locals: base_url = is_idn = url_norm = <function url_norm at 0x7f7d6593ded8> self = <https link, base_url=u'https://[IP', parent_url=u'https://wiki.webevaluation.nl/doku.php?id=fusionpbx', base_ref=None, recursion_level=5, url_connection=None, line=189, column=49, page=0, name=u'https://[IP', anchor=None, cache_url=None> self.base_url = u'https://[IP', len = 11 self.encoding = None File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 75, in url_norm line: return urlutil.url_norm(url, encoding=encoding) locals: urlutil = <module 'linkcheck.url' from '/usr/lib/python2.7/dist-packages/linkcheck/url.pyc'> urlutil.url_norm = <function url_norm at 0x7f7d65b77938> url = u'https://[IP', len = 11 encoding = None File "/usr/lib/python2.7/dist-packages/linkcheck/url.py", line 314, in url_norm line: urlparts = list(urlparse.urlsplit(url)) locals: urlparts = list = <type 'list'> urlparse = <module 'urlparse' from '/usr/lib/python2.7/urlparse.pyc'> urlparse.urlsplit = <function urlsplit at 0x7f7d6645c488> url = 'https://[IP', len = 11 File "/usr/lib/python2.7/urlparse.py", line 214, in urlsplit line: raise ValueError("Invalid IPv6 URL") locals: ValueError = <type 'exceptions.ValueError'> ValueError: Invalid IPv6 URL System info: LinkChecker 9.4.0 Released on: xx.xx.xxxx Python 2.7.15+ (default, Feb 3 2019, 13:13:16) [GCC 8.2.0] on linux2 Modules: Requests, Pysqlite, Sqlite Local time: 2019-02-27 12:14:21+000 sys.argv: ['/usr/bin/linkchecker', 'https://wiki.webevaluation.nl'] LC_CTYPE = 'en_US.UTF-8' LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

**** LinkChecker internal error, over and out **** WARNING linkcheck.check 2019-02-27 12:14:21,893 CheckThread-https://wiki.webevaluation.nl/doku.php?id=fusionpbx internal error occurred

dpalic commented 5 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues