wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

LinkChecker internal error #653

Closed cswarth closed 7 years ago

cswarth commented 8 years ago

[edited to add -D all trace] No further information at this time. I will update more if I can narrow this down further.

Doesn't anyone actually test software any more? This is the highest result when searching for 'python link checker' and it took absolutely zero effort to find fatal flaws in the first few seconds of use.

$ linkchecker -t 1 --cookiefile cookies.txt -v -Dall http://stoat:8000/
DEBUG 2016-05-10 13:13:42,033 MainThread Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
DEBUG 2016-05-10 13:13:42,034 MainThread reading configuration from ['/home/cwarth/.linkchecker/linkcheckerrc']
DEBUG 2016-05-10 13:13:42,035 MainThread Link pattern '.*/(logout|search)$' strict=1
INFO 2016-05-10 13:13:42,054 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
DEBUG 2016-05-10 13:13:42,062 MainThread configuration: [('aborttimeout', 300),
 ('allowedschemes', []),
 ('authentication', []),
 ('blacklist', {}),
 ('checkextern', False),
 ('cookiefile', 'cookies.txt'),
 ('csv', {}),
 ('debugmemory', False),
 ('dot', {}),
 ('enabledplugins', []),
 ('externlinks',
  [{'negate': False,
    'pattern': <_sre.SRE_Pattern object at 0x7ff260d11e10>,
    'strict': 1}]),
 ('fileoutput', []),
 ('gml', {}),
 ('gxml', {}),
 ('html', {}),
 ('ignorewarnings', []),
 ('internlinks', []),
 ('localwebroot', None),
 ('logger', 'TextLogger'),
 ('loginextrafields', {}),
 ('loginpasswordfield', 'password'),
 ('loginurl', None),
 ('loginuserfield', 'login'),
 ('maxfilesizedownload', 5242880),
 ('maxfilesizeparse', 1048576),
 ('maxhttpredirects', 10),
 ('maxnumurls', None),
 ('maxrequestspersecond', 10),
 ('maxrunseconds', None),
 ('nntpserver', None),
 ('none', {}),
 ('output', 'text'),
 ('pluginfolders', []),
 ('proxy', {}),
 ('quiet', False),
 ('recursionlevel', -1),
 ('sitemap', {}),
 ('sql', {}),
 ('sslverify', True),
 ('status', True),
 ('status_wait_seconds', 5),
 ('text', {}),
 ('threads', 1),
 ('timeout', 60),
 ('trace', False),
 ('useragent',
  u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'),
 ('verbose', True),
 ('warnings', True),
 ('xml', {})]
DEBUG 2016-05-10 13:13:42,064 MainThread HttpUrl handles url http://stoat:8000/
DEBUG 2016-05-10 13:13:42,065 MainThread checking syntax
DEBUG 2016-05-10 13:13:42,066 MainThread Add intern pattern u'^https?://(www\\.|)stoat\\:8000\\/'
DEBUG 2016-05-10 13:13:42,066 MainThread Link pattern u'^https?://(www\\.|)stoat\\:8000\\/' strict=False
DEBUG 2016-05-10 13:13:42,067 MainThread queueing http://stoat:8000/
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2016-05-10 13:13:42-007

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below

When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
  File "/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/local/lib/python2.7/site-packages/linkcheck/director/task.py", line 29, in run
    line: self.run_checked()
    locals:
      self = <local> <Checker(Thread-2, started 140678966736640)>
      self.run_checked = <local> <bound method Checker.run_checked of <Checker(Thread-2, started 140678966736640)>>
  File "/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/local/lib/python2.7/site-packages/linkcheck/director/checker.py", line 94, in run_checked
    line: self.add_request_session()
    locals:
      self = <local> <Checker(Thread-2, started 140678966736640)>
      self.add_request_session = <local> <bound method Aggregate.add_request_session of <linkcheck.director.aggregator.Aggregate object at 0x7ff260578790>>
  File "/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/local/lib/python2.7/site-packages/linkcheck/decorators.py", line 100, in newfunc
    line: return func(*args, **kwargs)
    locals:
      func = <local> <function add_request_session at 0x7ff260ceca28>
      args = <local> (<linkcheck.director.aggregator.Aggregate object at 0x7ff260578790>,)
      kwargs = <local> {}
  File "/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/local/lib/python2.7/site-packages/linkcheck/director/aggregator.py", line 121, in add_request_session
    line: session = new_request_session(self.config, self.cookies)
    locals:
      session = <not found>
      new_request_session = <global> <function new_request_session at 0x7ff260cec410>
      self = <local> <linkcheck.director.aggregator.Aggregate object at 0x7ff260578790>
      self.config = <local> {'ignorewarnings': [], 'loginuserfield': 'login', 'verbose': True, 'gml': {}, 'enabledplugins': [], 'recursionlevel': -1, 'fileoutput': [], 'maxfilesizeparse': 1048576, 'maxrequestspersecond': 10, 'loginpasswordfield': 'password', 'localwebroot': None, 'maxhttpredirects': 10, 'loginurl': None, 's..., len = 49
      self.cookies = <local> None
  File "/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/local/lib/python2.7/site-packages/linkcheck/director/aggregator.py", line 47, in new_request_session
    line: for cookie in cookies.from_file(config["cookiefile"]):
    locals:
      cookie = <not found>
      cookies = <local> None
      cookies.from_file = <local> !AttributeError: 'NoneType' object has no attribute 'from_file'
      config = <local> {'ignorewarnings': [], 'loginuserfield': 'login', 'verbose': True, 'gml': {}, 'enabledplugins': [], 'recursionlevel': -1, 'fileoutput': [], 'maxfilesizeparse': 1048576, 'maxrequestspersecond': 10, 'loginpasswordfield': 'password', 'localwebroot': None, 'maxhttpredirects': 10, 'loginurl': None, 's..., len = 49
AttributeError: 'NoneType' object has no attribute 'from_file'
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Requests: 2.10.0
Modules: Sqlite
Local time: 2016-05-10 13:13:42-007
sys.argv: ['/shared/silo_researcher/Matsen_F/MatsenGrp/working/cwarth/beast-si-scons/venv/bin/linkchecker', '-t', '1', '--cookiefile', 'cookies.txt', '-v', '-D', 'all', 'http://stoat:8000/']
LANGUAGE = 'en_US:'
LANG = 'en_US.UTF-8'
Default locale: ('en', 'UTF-8')

 ******** LinkChecker internal error, over and out ********
WARNING 2016-05-10 13:13:42,080 Thread-2 internal error occurred
 0 threads active,     1 link queued,    0 links in   0 URLs checked, runtime 1 seconds
WARNING 2016-05-10 13:13:45,532 MainThread interrupt; waiting for active threads to finisH
WARNING 2016-05-10 13:13:45,532 MainThread another interrupt will exit immediately
INFO 2016-05-10 13:13:45,532 MainThread 0 URLs are still active. After a timeout of 5 minutes the active URLs will stop.

Statistics:
Downloaded: 0B.
No statistics available since no URLs were checked.

The check has been interrupted; results are not complete.
That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found.
There was 1 internal error.
Stopped checking at 2016-05-10 13:13:45-007 (3 seconds)
dpalic commented 7 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists