wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

linkchecker breaks when checking this URL: https://entropia.de/Benutzer%3A%24dude #645

Open hreese opened 8 years ago

hreese commented 8 years ago

Version (installed via pip into virtualenv):

$ linkchecker -V
INFO 2016-04-03 06:26:48,202 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.3 released 16.7.2014
Copyright (C) 2000-2014 Bastian Kleineidam

Output:

INFO 2016-04-03 06:25:35,398 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
 1 thread active,     0 links queued,    0 links in   0 URLs checked, runtime 1 seconds

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below

When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/director/checker.py", line 104, in check_url
    line: self.check_url_data(url_data)
    locals:
      self = <local> <Checker(CheckThread-https://entropia.de/Benutzer%3A%24dude, started 139672758650624)>
      self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-https://entropia.de/Benutzer%3A%24dude, started 139672758650624)>>
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/director/checker.py", line 120, in check_url_data
    line: check_url(url_data, self.logger)
    locals:
      check_url = <global> <function check_url at 0x7f081a222aa0>
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
      self = <local> <Checker(CheckThread-https://entropia.de/Benutzer%3A%24dude, started 139672758650624)>
      self.logger = <local> <linkcheck.director.logger.Logger object at 0x7f0819ac3310>
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/director/checker.py", line 64, in check_url
    line: parser.parse_url(url_data)
    locals:
      parser = <global> <module 'linkcheck.parser' from '/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/parser/__init__.pyc'>
      parser.parse_url = <global> <function parse_url at 0x7f081a222140>
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 39, in parse_url
    line: globals()[funcname](url_data)
    locals:
      globals = <builtin> <built-in function globals>
      funcname = <local> 'parse_html', len = 10
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 48, in parse_html
    line: find_links(url_data, url_data.add_url, linkparse.LinkTags)
    locals:
      find_links = <global> <function find_links at 0x7f081a222578>
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
      url_data.add_url = <local> <bound method HttpUrl.add_url of <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>>
      linkparse = <global> <module 'linkcheck.htmlutil.linkparse' from '/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/htmlutil/linkparse.pyc'>
      linkparse.LinkTags = <global> {'source': [u'src'], 'q': [u'cite'], 'form': [u'action'], 'html': [u'manifest'], 'body': [u'background'], 'xmp': [u'href'], 'video': [u'src'], 'a': [u'href'], 'meta': [u'content', u'href'], 'input': [u'src', u'usemap', u'formaction'], 'button': [u'formaction'], 'area': [u'href'], 'ilayer': [u'bac..., len = 35
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 126, in find_links
    line: parser.feed(url_data.get_content())
    locals:
      parser = <local> <linkcheck.HtmlParser.htmlsax.parser object at 0x7f08182741b0>
      parser.feed = <local> <built-in method feed of linkcheck.HtmlParser.htmlsax.parser object at 0x7f08182741b0>
      url_data = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
      url_data.get_content = <local> <bound method HttpUrl.get_content of <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>>
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/htmlutil/linkparse.py", line 231, in start_element
    line: self.parse_tag(tag, attr, value, name, base)
    locals:
      self = <local> <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f0818264d50>
      self.parse_tag = <local> <bound method LinkFinder.parse_tag of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f0818264d50>>
      tag = <local> u'link'
      attr = <local> u'href'
      value = <local> u'/load.php?debug=false&lang=de&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.monobook.styles&only=styles&skin=monobook', len = 182
      name = <local> u''
      base = <local> u''
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/htmlutil/linkparse.py", line 277, in parse_tag
    line: self.found_url(value, name, base)
    locals:
      self = <local> <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f0818264d50>
      self.found_url = <local> <bound method LinkFinder.found_url of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f0818264d50>>
      value = <local> u'/load.php?debug=false&lang=de&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.monobook.styles&only=styles&skin=monobook', len = 182
      name = <local> u''
      base = <local> u''
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/htmlutil/linkparse.py", line 283, in found_url
    line: column=self.parser.last_column(), name=name, base=base)
    locals:
      column = <not found>
      self = <local> <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7f0818264d50>
      self.parser = <local> <linkcheck.HtmlParser.htmlsax.parser object at 0x7f08182741b0>
      self.parser.last_column = <local> <built-in method last_column of linkcheck.HtmlParser.htmlsax.parser object at 0x7f08182741b0>
      name = <local> u''
      base = <local> u''
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 653, in add_url
    line: page=page, name=name, parent_content_type=self.content_type)
    locals:
      page = <local> 0
      name = <local> u''
      parent_content_type = <not found>
      self = <local> <https link, base_url=u'https://entropia.de/Benutzer:$dude', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=https://entropia.de/Benutzer%3A%24dude>
      self.content_type = <local> 'text/html', len = 9
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/checker/__init__.py", line 125, in get_url_from
    line: line=line, column=column, page=page, name=name, extern=extern)
    locals:
      line = <local> 10
      column = <local> 1
      page = <local> 0
      name = <local> u''
      extern = <local> None
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 117, in __init__
    line: aggregate, line, column, page, name, url_encoding, extern)
    locals:
      aggregate = <local> <linkcheck.director.aggregator.Aggregate object at 0x7f0819ac3290>
      line = <local> 10
      column = <local> 1
      page = <local> 0
      name = <local> u''
      url_encoding = <local> None
      extern = <local> None
  File "/home/heiko/src/linkchecker/ENV/local/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 157, in init
    line: "unquoted parent URL %r" % self.parent_url
    locals:
      self = <local> <None link, base_url=u'/load.php?debug=false&lang=de&modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.sectionAnchor%7Cmediawiki.skinning.interface%7Cskins.monobook.styles&only=styles&skin=monobook', parent_url=u'https://entropia.de/Benutzer:$dude', base_ref=None, recursion_level=1, url_c...
      self.parent_url = <local> u'https://entropia.de/Benutzer:$dude', len = 34
AssertionError: unquoted parent URL u'https://entropia.de/Benutzer:$dude'
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Requests: 2.9.1
Modules: Sqlite
Local time: 2016-04-03 06:25:38+002
sys.argv: ['/home/heiko/src/linkchecker/ENV/bin/linkchecker', 'https://entropia.de/Benutzer:$dude']
LANG = 'en_US.UTF-8'
Default locale: ('en', 'UTF-8')

 ******** LinkChecker internal error, over and out ********
WARNING 2016-04-03 06:25:38,909 CheckThread-https://entropia.de/Benutzer%3A%24dude internal error occurred
JazzMaster commented 8 years ago

similar issue: its not your site. internal python loading module breakage. someone forgot to include a module somewhere. (s): southernhedgehogs.org

JazzMaster commented 8 years ago

also: how does this bot ID itself?

dpalic commented 7 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists