wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

"Opps" while parsing bad mailto syntax #120

Closed wummel closed 11 years ago

wummel commented 11 years ago

Converted from SourceForge issue 1290563, submitted by gazbo

Reproduce by running: linkchecker --no-warnings --recursion-level=1 --ignore-url='^mailto:' http://sca.berkeley.edu/breaks_linkchecker.html

Offending HTML: Felix MacAvady

0 ablazej@satyr:~/image/publish/kates$ linkchecker --no-warnings --recursion-level=1 --ignore-url='^mailto:' http://sca.berkeley.edu/breaks_linkchecker.html LinkChecker 3.2 Copyright (C) 2000-2005 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://linkchecker.sourceforge.net/ Write comments and bugs to calvin@users.sourceforge.net

Start checking at 2005-09-13 17:54:57-007

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913 or send mail to calvin@users.sourceforge.net and include the following information:

Disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

exceptions.ValueError bad query field: u'Kates Heraldry Question.' Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/linkcheck/checker/urlbase.py", line 339, in check self.local_check() File "/usr/lib/python2.4/site-packages/linkcheck/checker/urlbase.py", line 415, in local_check self.parse_url() File "/usr/lib/python2.4/site-packages/linkcheck/checker/httpurl.py", line 588, in parse_url self.parse_html() File "/usr/lib/python2.4/site-packages/linkcheck/checker/urlbase.py", line 635, in parse_html cmdline=False) File "/usr/lib/python2.4/site-packages/linkcheck/checker/init.py", line 366, in get_url_from line=line, column=column, name=name) File "/usr/lib/python2.4/site-packages/linkcheck/checker/urlbase.py", line 76, in init self.check_syntax() File "/usr/lib/python2.4/site-packages/linkcheck/checker/urlbase.py", line 262, in check_syntax self.build_url() File "/usr/lib/python2.4/site-packages/linkcheck/checker/mailtourl.py", line 65, in build_url self.addresses = email.Utils.getaddresses([self.cutout_addresses()]) File "/usr/lib/python2.4/site-packages/linkcheck/checker/mailtourl.py", line 110, in cutout_addresses headers = cgi.parse_qs(url[(i+1):], strict_parsing=True) File "/usr/lib/python2.4/cgi.py", line 183, in parse_qs for name, value in parse_qsl(qs, keep_blank_values, strict_parsing): File "/usr/lib/python2.4/cgi.py", line 217, in parse_qsl raise ValueError, "bad query field: %r" % (name_value,) ValueError: bad query field: u'Kates Heraldry Question.' System info: LinkChecker 3.2 Python 2.4.1 (#2, May 5 2005, 11:32:06) [GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2 LC_ALL = 'C'

\ LinkChecker internal error, over and out **

That's it. 27 links checked. 17 warnings found. 0 errors found. Stopped checking at 2005-09-13 17:55:01-007 ( 3.749 seconds)

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205

This is fixed in CVS and will be in the next release. Thanks for the report. CVS files: linkcheck/checker/mailtourl.py, rev. 1.28