wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Support for large amount of html documentation on Win64 systems. #606

Open jmichelberger opened 9 years ago

jmichelberger commented 9 years ago

I am using LinkChecker to check a quite large documentation packet (~6,5GB) hosted on my disk.

LinkChecker works very well, but after a while it starts printing stack traces to console out.

The size of the linkchecker.exe process in memory is about the magic WOW64 2GB limit of a 32Bit App on 64Bit Windows. I/O Bytes of linkchecker.exe in ProcessExplorer suggest that only 400MB of data was transferred to linkchecker.exe. That is less than 10% of my documentation packet. 16 threads active, 480080 links queued, 1110443 links in 7892 URLs checked, Laufzeit 32 Minuten, 13 Sekunden Get: MemoryError: out of memory And several follow up Stack Traces.

rob-at-work commented 9 years ago

Seeing same issue with version 9.3 in both Gui (crashes at about 2gb) and command line version.

URL /buy/property/2-bedroom-flat-in-worcester,wr1-ref-2931567/' NameRead More' Parent URL http://www.propertywide.co.uk/buy/search/worcestershire/flat/, line 607, col 2680 Real URL http://www.propertywide.co.uk/buy/property/2-bedroom-flat-in-worcester,wr1-ref-2931567/ Check time 3.055 seconds Result Error: ConnectionError: HTTPConnectionPool(host='www.propertywide.co.uk', port=80): Max retries exceeded with url: /buy/property/2-bedroom-flat-in-worcester,wr1-ref-2931567/ (Caused by <class 'socket.error'>: [Errno 10055] An operation on a socket...

****** Oops, I did it again. *****

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

When using the commandline client:

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "linkcheck\director\checker.pyo", line 104, in check_url -- couldn't find file, trying this instead: C:\Program Files (x86)\LinkChecker\library.zip\linkcheck\director\checker.py -- code not available -- File "linkcheck\director\checker.pyo", line 120, in check_url_data -- couldn't find file, trying this instead: C:\Program Files (x86)\LinkChecker\library.zip\linkcheck\director\checker.py -- code not available -- File "linkcheck\director\checker.pyo", line 57, in check_url -- couldn't find file, trying this instead: C:\Program Files (x86)\LinkChecker\library.zip\linkcheck\director\checker.py -- code not available -- File "linkcheck\decorators.pyo", line 100, in newfunc -- couldn't find file, trying this instead: C:\Program Files (x86)\LinkChecker\library.zip\linkcheck\decorators.py -- code not available -- File "linkcheck\cache\results.pyo", line 56, in add_result -- couldn't find file, trying this instead: C:\Program Files (x86)\LinkChecker\library.zip\linkcheck\cache\results.py -- code not available -- MemoryError System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win32 Requests: 2.2.1 Modules: Sqlite Local time: 2015-09-03 11:12:53+001 sys.argv: ['C:\Program Files (x86)\LinkChecker\linkchecker.exe', 'http://www.propertywide.co.uk'] LANG = 'en_GB.UTF-8' Default locale: ('en', 'cp1252')

\ LinkChecker internal error, over and out **

work around (for me) is to exclude a chunk of addresses using the exclude regex option.

dpalic commented 6 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists