wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

"out of dynamic memory in yy_scan_bytes()" #677

Open uogbuji opened 7 years ago

uogbuji commented 7 years ago

Command line to repro:

linkchecker -F xml/report.xml http://halan.library.link/

No other config. Also this:

$ linkchecker -Dall
DEBUG 2016-09-09 21:33:38,580 MainThread Python 2.7.12 (default, Sep  9 2016, 09:17:26) 
[GCC 4.7.2] on linux2
DEBUG 2016-09-09 21:33:38,582 MainThread reading configuration from ['/home/uche/.linkchecker/linkch
eckerrc']
INFO 2016-09-09 21:33:38,625 MainThread Checking intern URLs only; use --check-extern to check exter
n URLs.
DEBUG 2016-09-09 21:33:38,636 MainThread configuration: [('aborttimeout', 300),
 ('allowedschemes', []),
 ('authentication', []),
 ('blacklist', {}),
 ('checkextern', False),
 ('cookiefile', None),
 ('csv', {}),
 ('debugmemory', False),
 ('dot', {}),
 ('enabledplugins', []),
 ('externlinks', []),
 ('fileoutput', []),
 ('gml', {}),
 ('gxml', {}),
 ('html', {}),
 ('ignorewarnings', []),
 ('internlinks', []),
 ('localwebroot', None),
 ('logger', 'TextLogger'),
 ('loginextrafields', {}),
 ('loginpasswordfield', 'password'),
 ('loginurl', None),
 ('loginuserfield', 'login'),
 ('maxfilesizedownload', 5242880),
 ('maxfilesizeparse', 1048576),
 ('maxhttpredirects', 10),
 ('maxnumurls', None),
 ('maxrequestspersecond', 10),
 ('maxrunseconds', None),
 ('nntpserver', None),
 ('none', {}),
 ('output', 'text'),
 ('pluginfolders', []),
 ('proxy', {}),
 ('quiet', False),
 ('recursionlevel', -1),
 ('sitemap', {}),
 ('sql', {}),
 ('sslverify', True),
 ('status', True),
 ('status_wait_seconds', 5),
 ('text', {}),
 ('threads', 10),
 ('timeout', 60),
 ('trace', False),
 ('useragent',
  u'Mozilla/5.0 (compatible; LinkChecker/9.3; +http://wummel.github.io/linkchecker/)'),
 ('verbose', False),
 ('warnings', True),
 ('xml', {})]
WARNING 2016-09-09 21:33:38,641 MainThread no files or URLs given
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2016-09-09 21:33:38-006

Statistics:
Downloaded: 0B.
No statistics available since no URLs were checked.

That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2016-09-09 21:33:38-006 (0.00 seconds)

Full error output:

WARNING 2016-09-09 13:28:56,976 CheckThread-http://halan.library.link/portal/Seducing-the-knight-Ger
ri-Russell/UF-h5ivfmUM/ internal error occurred
WARNING 2016-09-09 13:28:56,991 CheckThread-http://halan.library.link/portal/Seducing-the-knight-Ger
ri-Russell/UF-h5ivfmUM/ internal error occurred

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below

When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/director/checker.py", line
 104, in check_url
    line: self.check_url_data(url_data)
    locals:
      self = <local> <Checker(CheckThread-http://halan.library.link/portal/Rift.--Rift-Andrea-Cremer
-electronic/fw-pZE4-ISg/, started -1282073744)>
      self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-htt
p://halan.library.link/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4-ISg/, started -1282073744
)>>
      url_data = <local> <http link, base_url=u'/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4
-ISg/', parent_url=u'http://halan.library.link/resource/gzeifx9LOVM/', base_ref=None, recursion_leve
l=3, url_connection=None, line=1923, column=9, page=0, name=u'', anchor=u'', cache_url=http://halan.
library.link/portal/Ri...
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/director/checker.py", line
 120, in check_url_data
    line: check_url(url_data, self.logger)
    locals:
      check_url = <global> <function check_url at 0xb6a1280c>
      url_data = <local> <http link, base_url=u'/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4
-ISg/', parent_url=u'http://halan.library.link/resource/gzeifx9LOVM/', base_ref=None, recursion_leve
l=3, url_connection=None, line=1923, column=9, page=0, name=u'', anchor=u'', cache_url=http://halan.
library.link/portal/Ri...
      self = <local> <Checker(CheckThread-http://halan.library.link/portal/Rift.--Rift-Andrea-Cremer
-electronic/fw-pZE4-ISg/, started -1282073744)>
      self.logger = <local> <linkcheck.director.logger.Logger object at 0xb696ae4c>
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/director/checker.py", line
 64, in check_url
    line: parser.parse_url(url_data)
    locals:
      parser = <global> <module 'linkcheck.parser' from '/home/uche/.local/pyenv/py2/lib/python2.7/s
ite-packages/linkcheck/parser/__init__.pyc'>
      parser.parse_url = <global> <function parse_url at 0xb6a87df4>
      url_data = <local> <http link, base_url=u'/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4
-ISg/', parent_url=u'http://halan.library.link/resource/gzeifx9LOVM/', base_ref=None, recursion_leve
l=3, url_connection=None, line=1923, column=9, page=0, name=u'', anchor=u'', cache_url=http://halan.
library.link/portal/Ri...
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 
39, in parse_url
    line: globals()[funcname](url_data)
    locals:
      globals = <builtin> <built-in function globals>
      funcname = <local> 'parse_html', len = 10
      url_data = <local> <http link, base_url=u'/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4
-ISg/', parent_url=u'http://halan.library.link/resource/gzeifx9LOVM/', base_ref=None, recursion_leve
l=3, url_connection=None, line=1923, column=9, page=0, name=u'', anchor=u'', cache_url=http://halan.
library.link/portal/Ri...
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 
48, in parse_html
    line: find_links(url_data, url_data.add_url, linkparse.LinkTags)
    locals:
      find_links = <global> <function find_links at 0xb6a1202c>
      url_data = <local> <http link, base_url=u'/portal/Rift.--Rift-Andrea-Cremer-electronic/fw-pZE4
-ISg/', parent_url=u'http://halan.library.link/resource/gzeifx9LOVM/', base_ref=None, recursion_leve
l=3, url_connection=None, line=1923, column=9, page=0, name=u'', anchor=u'', cache_url=http://halan.
library.link/portal/Ri...
      url_data.add_url = <local> <bound method HttpUrl.add_url of <http link, base_url=u'/portal/Rif
t.--Rift-Andrea-Cremer-electronic/fw-pZE4-ISg/', parent_url=u'http://halan.library.link/resource/gze
ifx9LOVM/', base_ref=None, recursion_level=3, url_connection=None, line=1923, column=9, page=0, name
=u'', anchor=u'', cache_url=ht...
      linkparse = <global> <module 'linkcheck.htmlutil.linkparse' from '/home/uche/.local/pyenv/py2/
lib/python2.7/site-packages/linkcheck/htmlutil/linkparse.pyc'>
      linkparse.LinkTags = <global> {'meta': [u'content', u'href'], 'iframe': [u'src', u'longdesc'],
 'frame': [u'src', u'longdesc'], 'xmp': [u'href'], 'del': [u'cite'], 'isindex': [u'action'], 'script
': [u'src'], 'video': [u'src'], 'td': [u'background'], 'link': [u'href'], 'blockquote': [u'cite'], '
ilayer': [u'background'], 'table'..., len = 35
  File "/home/uche/.local/pyenv/py2/lib/python2.7/site-packages/linkcheck/parser/__init__.py", line 
126, in find_links
 9 threads active, 531384 links queued, out of dynamic memory in yy_scan_bytes()
dpalic commented 6 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues