Open chrishanretty opened 7 years ago
Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues
I'm reporting an internal error as requested. Full output is below. The error was repeated several times with pages on this site: the common factor seems to be the presence of an apostrophe in the url.
** Oops, I did it again. *****
You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:
When using the commandline client:
Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .
Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>
self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data
line: check_url(url_data, self.logger)
locals:
check_url = <function check_url at 0x7fe5007c7230>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
self = <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>
self.logger = <linkcheck.director.logger.Logger object at 0x7fe501af9e50>
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 64, in check_url
line: parser.parse_url(url_data)
locals:
parser = <module 'linkcheck.parser' from '/usr/lib/python2.7/dist-packages/linkcheck/parser/init.pyc'>
parser.parse_url = <function parse_url at 0x7fe5007bf848>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 39, in parse_url
line: globals()funcname
locals:
globals =
funcname = 'parse_html', len = 10
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 48, in parse_html
line: find_links(url_data, url_data.add_url, linkparse.LinkTags)
locals:
find_links = <function find_links at 0x7fe5007bfc80>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
url_data.add_url = <bound method HttpUrl.add_url of <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, n...
linkparse = <module 'linkcheck.htmlutil.linkparse' from '/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.pyc'>
linkparse.LinkTags = {'tr': [u'background'], 'q': [u'cite'], 'meta': [u'content', u'href'], 'isindex': [u'action'], 'track': [u'src'], 'applet': [u'archive', u'src'], 'object': [u'classid', u'data', u'archive', u'usemap', u'codebase'], None: [u'style', u'itemtype'], 'layer': [u'background', u'src'], 'html': [u'manife..., len = 35
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 126, in find_links
line: parser.feed(url_data.get_content())
locals:
parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
parser.feed = <built-in method feed of linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
url_data.get_content = <bound method HttpUrl.get_content of <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=...
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 231, in start_element
line: self.parse_tag(tag, attr, value, name, base)
locals:
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.parse_tag = <bound method LinkFinder.parse_tag of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>>
tag = u'link'
attr = u'href'
value = u'/siteelements/styles/100-system.css?version=2692258?version=2692258', len = 67
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 277, in parse_tag
line: self.found_url(value, name, base)
locals:
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.found_url = <bound method LinkFinder.found_url of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>>
value = u'/siteelements/styles/100-system.css?version=2692258?version=2692258', len = 67
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 283, in found_url
line: column=self.parser.last_column(), name=name, base=base)
locals:
column =
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
self.parser.last_column = <built-in method last_column of linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 653, in add_url
line: page=page, name=name, parent_content_type=self.content_type)
locals:
page = 0
name = u''
parent_content_type =
self = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
self.content_type = 'text/html', len = 9
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/init.py", line 125, in get_url_from
line: line=line, column=column, page=page, name=name, extern=extern)
locals:
line = 8
column = 422
page = 0
name = u''
extern = None
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 117, in init
line: aggregate, line, column, page, name, url_encoding, extern)
locals:
aggregate = <linkcheck.director.aggregator.Aggregate object at 0x7fe501af9610>
line = 8
column = 422
page = 0
name = u''
url_encoding = None
extern = None
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 157, in init
line: "unquoted parent URL %r" % self.parent_url
locals:
self = <None link, base_url=u'/siteelements/styles/100-system.css?version=2692258?version=2692258', parent_url=u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430", base_ref=None, recursion_level=4, url_connection=None, ...
self.parent_url = u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430", len = 133
AssertionError: unquoted parent URL u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430"
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Requests: 2.10.0
Modules: Sqlite
Local time: 2017-08-29 12:35:20+001
sys.argv: ['/usr/bin/linkchecker', 'https://www.royalholloway.ac.uk/politicsandir/home.aspx']
LANGUAGE = 'en_GB:en'
LANG = 'en_GB.UTF-8'
Default locale: ('en', 'UTF-8')