Too many open files exception

suever commented 9 years ago

From andreas....@gmail.com on December 09, 2009 10:31:18

Thanks for the great piece of software!

I have one issue. I use this library for an anonyization tool I wrote. It works great but at some point I get an exception "too many open files":

File "c:\Python25\lib\site-packages\dicom\filereader.py", line 410, in read_file File "c:\python25\lib\site-packages\dicom\filebase.py", line 152, in DicomFile IOError: [Errno 24] Too many open files

This occurs on Windows XP, Vista, and 7, where the number of open files is seemingly limited to some number. I have reviewed my code and it only uses dicom.read_file() and dicom.write_file(). It does not open any other files. The parts where the files are read and written are called sequentially, so there should not be a problem with too many open files.

I also reviewed filereader.py and filewriter.py and from what I can see, those functions open() and close() the files in the correct way. I added "del fp" in filereader.py and filewriter.py just after the close() there in order to find out whether brutally deleting the objects closes them on OS level, but I was not successful. The exception persists.

The problem occurs when I read and write many files. I have no real number, but from my tests it seems that if you read and write more than 2000 files, the exception is thrown. If I let my beast eat a smaller number of files, everything is fine.

It is of course possible that my own code using dicom is just brainless, but I am pretty sure that the methods are called correctly and in the right order.

Original issue: http://code.google.com/p/pydicom/issues/detail?id=68

suever commented 9 years ago

From DaniN...@gmail.com on December 10, 2009 07:18:43

This might have to do with a limited number of low-level open file handles. You could try something like (not tested):

in your application:

import os import dicom from dicom.filebase import DicomFileLike

for file_path in my_files: os_id = os.open(file_path, os.O_BINARY | os.O_RDONLY)) fd = os.fdopen(os_id) dataset = dicom.read_file(DicomFileLike(fd)) ... os.close(os_id)

You might also have to close the OS-level file handles of your output files.

suever commented 9 years ago

From andreas....@gmail.com on December 17, 2009 03:22:31

Thanks for the suggestion! It did improve things a bit, but I still get the problem when writing files. I did not find out how to apply the same scheme to writing, because write_file() also uses close() which then leads to a bad file descriptor OSError.

For reading I used the following code:

for filename in my_files: os_id = os.open(str(filename), os.O_BINARY | os.O_RDONLY) fd = os.fdopen(os_id) image = dicom.read_file(DicomFileLike(fd)) os.close(os_id)

Another question: Does the python garbage collector help in this?

suever commented 9 years ago

From andreas....@gmail.com on December 17, 2009 05:43:06

A little clarification (the last comment was a little messed up):

My little program reads all DICOM files twice, first for doing some analysis and then for actually doing something. I cannot keep all data in memory, because we're talking about Gigabytes here. So first all dicom files are loaded once (via file_read()), then some calculation is done and then the files are opened again and a modified version is written to disk. I've noticed the following behaviour:

1 for file in all_files: 2 image = dicom.read_file(file) 3 4 do_some_stuff() 5 6 for file in all_files: 7 image = dicom.read_file(file) 8 do_something_with_the_image(image) 9 dicom.write_file(new_file, image)

Observations:

If I only use the suggested fix in line 2, the behaviour gets better, but I still get a too many open files error, only later than before.
If I use the fix in line 2 and line 7, I get bad file descriptors

suever commented 9 years ago

From darcymason@gmail.com on December 17, 2009 21:58:45

I've pushed some code to the repository which might help ( revision 0b5842e071 ) -- it should handle these kinds of issues better. With this new code, if you supply a filename (reading or writing), it should alway close the file, even when exceptions occur. If you pass a file object (reading only so far), then you are responsible for closing it.

In your code above, is all_files a list of file names, or a list of file objects?

Another thought: I'm not sure about os.open, os.fdopen etc. When I have used a file object, I have simply used the object returned by the builtin open() (not os.open).

You asked about garbage collection -- I don't think that should matter. But maybe there is some kind of latency after python closing the object before windows actually
frees the resource. Perhaps there is a flush command of some kind needed?

I will modify write_file so it also accepts a file object, then maybe there will be more options to explore...

suever commented 9 years ago

From andreas....@gmail.com on December 18, 2009 01:36:26

Thanks, I'll try the new code next week, when I'm back in the office.

To answer your question: all_files is a list of filenames, not file objects. I like the idea of not having to care about file handling. Calling read_file() and write_file() with a file name rather than a file object is exactly what I want. :-)

open(): I guess that there is some kind of latency on the Windows platform that produces this error when using many files. I would think that close() does what it should do and that the OS is slow in noticing that action. But that is just a wild guess...

Garbage collection: I tried using gc.collect() after each file was close()d, but it did not change anything.

suever commented 9 years ago

From DaniN...@gmail.com on December 18, 2009 03:14:24

fd=open(); fd.close() indeed seems to do what one would expect, so no "obvious" need for accessing low-level file handles via os methods. I could not verify my suspicion when testing and also got according feedback from c.l.p.

suever / pydicom-experimental

Too many open files exception #68