wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Error: (24, Too many open files) #169

Closed wummel closed 11 years ago

wummel commented 11 years ago

Converted from SourceForge issue 1631042, submitted by ut1200

Since migrating to LinkChecker 4.5 and now 4.6 LinkChecker is running OK without any errors but the log and html report files return "Error: (24, Too many open files)" for many of the URL's checked. this did not occur with LinkChecker version 4.1.

the larger html report files now contain 100's of Error entries like this but if you follow the links in the report file the URL's are accessible. smaller report files do not have these Errors. it looks like a problem only if there are thousands of links to check.

I have searched sourceforge.net but not found other examples of this issue.

my research suggests this is about the number of open file handles in Solaris but our hard limit is 65536 and I increased our soft limit from 256 to 1024 and still the error persists.

can you help please.

here is an example of the output:

"URL 'http://www.rfds.info/About.htm' Name 'Western division' Parent URL http://www.library.uwa.edu.au/education_training___and___support/education,_training__and__support_relevant_to_your_faculty/faculty_of_medicine__and__dentistry/rural_week_information_resources/royal_flying_doctor_service, line 170, col 46 (HTML) (CSS) Real URL http://www.rfds.info/About.htm Check Time 11.145 seconds Result Error: (24, 'Too many open files')

That's it. 31194 links checked. 2139 warnings found. 132 errors found. Stopped checking at 2007-01-09 00:41:36+009 (36 minutes, 28 seconds)"

is there any other information you would like to have?ut1200

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205 Originator: NO

This seems to be a platform related problem, since no one including me has seen this error before. Did you update the Python version also, together with LinkChecker? Perhaps this is a Python regression, not a LinkChecker one.

Anyway, it would help me if you could attach the output of a debug run (ie. with option -Dall). You can also mail it privately to me if you don't want to attach the data here.

wummel commented 11 years ago

Submitted by ut1200

Logged In: YES user_id=1650352 Originator: YES

Upgraded to Python2.4 with LinkChecker4.6. hope the attached file helps. harvey

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205 Originator: NO

There is no file attached!

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205 Originator: NO

I just read that Python fixed a open file descriptor leak in urllib2.py. Could you test the patch[1] and tell if it solves the problem?

[1] http://sourceforge.net/tracker/index.php?func=detail&aid=1627441&group_id=5470&atid=305470

wummel commented 11 years ago

Submitted by ut1200

Logged In: YES user_id=1650352 Originator: YES

Hi. I have the downloaded urllib2 patch and the version 2 patch but I don't understand how to install them. the content of the patch is very different to the urllib2 file so do I just place the version 2 patch in the directory with urllib2 and rerun the 'python setup.py install' command ? I'm a unix/linux/python novice.

wummel commented 11 years ago

Submitted by ut1200

Logged In: YES user_id=1650352 Originator: YES

Hi. I have the downloaded urllib2 patch and the version 2 patch but I don't understand how to install them. the content of the patch is very different to the urllib2 file so do I just place the version 2 patch in the directory with urllib2 and rerun the 'python setup.py install' command ? I'm a unix/linux/python novice.

wummel commented 11 years ago

Submitted by nobody

Logged In: NO

The two parts of the patch to socket.py and to urllib2.py should apply cleanly to your local version. Look at your /usr/lib/python2.4/ directory (or where you installed the Python library into), locate the two files socket.py and urllib2.py, and apply the patch parts for those two. Then rerun your linkchecker command.

If you have trouble regarding the patch format itself: it is in unified diff format and documented here: http://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html Diff files can be applied with the patch(1) utility, and generated using diff(1).

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205 Originator: NO

The two parts of the patch to socket.py and to urllib2.py should apply cleanly to your local version. Look at your /usr/lib/python2.4/ directory (or where you installed the Python library into), locate the two files socket.py and urllib2.py, and apply the patch parts for those two. Then rerun your linkchecker command.

If you have trouble regarding the patch format itself: it is in unified diff format and documented here: http://www.gnu.org/software/diffutils/manual/html_node/Unified-Format.html Diff files can be applied with the patch(1) utility, and generated using diff(1).

wummel commented 11 years ago

Submitted by grupp

Logged In: YES user_id=58058 Originator: NO

Hi,

this seems to be the same problem as for me on a linux-system with python 2.4. I've applied the patches to the python libs as suggested in this thread. But the problem still remains :-(. Any further suggestions?

wummel commented 11 years ago

Submitted by calvin

Logged In: YES user_id=9205 Originator: NO

We believe that the issue you reported is fixed in the latest version of linkchecker which can be found in the project's Subversion repository under: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk

Thank you for reporting the issue. It is now marked as pending and will be closed automatically in two weeks. If you believe that the issue is not fixed appropriately you can reopen this tracker item by resetting the status from "pending" to "open".

If you have questions or further comments feel free to contact us under calvin@users.sourceforge.net.

(This message was generated automatically.)

wummel commented 11 years ago

Submitted by sf-robot

Logged In: YES user_id=1312539 Originator: NO

This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker).

landor commented 11 years ago

I'm getting this error checking a site.

I'm using linkchecker 8.4 using python 2.7.5 on Arch linux.

Here are the stats to give an idea of the size of site I'm checking:

Statistics: Downloaded: 76.9MB Robots.txt cache: 13835 hits, 2889 misses Number of domains: 2034 Content types: 1978 image, 5809 text, 0 video, 0 audio, 1073 application, 42 mail and 6967 other. URL lengths: min=14, max=793, avg=117.

That's it. 15869 links checked. 351 warnings found. 7234 errors found.

6519 of the errors are "Result Error: error: [Errno 24] Too many open files"

cat /proc/sys/fs/file-max 811676 cat /proc/sys/fs/file-nr 7072 0 811676 (I ran this repeatedly throughout and it never went up a whole lot)