Open suever opened 9 years ago
From DaniN...@gmail.com on December 10, 2009 07:18:43
This might have to do with a limited number of low-level open file handles. You could try something like (not tested):
in your application:
import os import dicom from dicom.filebase import DicomFileLike
for file_path in my_files: os_id = os.open(file_path, os.O_BINARY | os.O_RDONLY)) fd = os.fdopen(os_id) dataset = dicom.read_file(DicomFileLike(fd)) ... os.close(os_id)
You might also have to close the OS-level file handles of your output files.
From andreas....@gmail.com on December 17, 2009 03:22:31
Thanks for the suggestion! It did improve things a bit, but I still get the problem when writing files. I did not find out how to apply the same scheme to writing, because write_file() also uses close() which then leads to a bad file descriptor OSError.
For reading I used the following code:
for filename in my_files: os_id = os.open(str(filename), os.O_BINARY | os.O_RDONLY) fd = os.fdopen(os_id) image = dicom.read_file(DicomFileLike(fd)) os.close(os_id)
Another question: Does the python garbage collector help in this?
From andreas....@gmail.com on December 17, 2009 05:43:06
A little clarification (the last comment was a little messed up):
My little program reads all DICOM files twice, first for doing some analysis and then for actually doing something. I cannot keep all data in memory, because we're talking about Gigabytes here. So first all dicom files are loaded once (via file_read()), then some calculation is done and then the files are opened again and a modified version is written to disk. I've noticed the following behaviour:
1 for file in all_files: 2 image = dicom.read_file(file) 3 4 do_some_stuff() 5 6 for file in all_files: 7 image = dicom.read_file(file) 8 do_something_with_the_image(image) 9 dicom.write_file(new_file, image)
Observations:
From darcymason@gmail.com on December 17, 2009 21:58:45
I've pushed some code to the repository which might help ( revision 0b5842e071 ) -- it should handle these kinds of issues better. With this new code, if you supply a filename (reading or writing), it should alway close the file, even when exceptions occur. If you pass a file object (reading only so far), then you are responsible for closing it.
In your code above, is all_files a list of file names, or a list of file objects?
Another thought: I'm not sure about os.open, os.fdopen etc. When I have used a file object, I have simply used the object returned by the builtin open() (not os.open).
You asked about garbage collection -- I don't think that should matter. But maybe
there is some kind of latency after python closing the object before windows actually
frees the resource. Perhaps there is a flush command of some kind needed?
I will modify write_file so it also accepts a file object, then maybe there will be more options to explore...
From andreas....@gmail.com on December 18, 2009 01:36:26
Thanks, I'll try the new code next week, when I'm back in the office.
To answer your question: all_files is a list of filenames, not file objects. I like the idea of not having to care about file handling. Calling read_file() and write_file() with a file name rather than a file object is exactly what I want. :-)
open(): I guess that there is some kind of latency on the Windows platform that produces this error when using many files. I would think that close() does what it should do and that the OS is slow in noticing that action. But that is just a wild guess...
Garbage collection: I tried using gc.collect() after each file was close()d, but it did not change anything.
From DaniN...@gmail.com on December 18, 2009 03:14:24
fd=open(); fd.close() indeed seems to do what one would expect, so no "obvious" need for accessing low-level file handles via os methods. I could not verify my suspicion when testing and also got according feedback from c.l.p.
From andreas....@gmail.com on December 09, 2009 10:31:18
Thanks for the great piece of software!
I have one issue. I use this library for an anonyization tool I wrote. It works great but at some point I get an exception "too many open files":
File "c:\Python25\lib\site-packages\dicom\filereader.py", line 410, in read_file File "c:\python25\lib\site-packages\dicom\filebase.py", line 152, in DicomFile IOError: [Errno 24] Too many open files
This occurs on Windows XP, Vista, and 7, where the number of open files is seemingly limited to some number. I have reviewed my code and it only uses dicom.read_file() and dicom.write_file(). It does not open any other files. The parts where the files are read and written are called sequentially, so there should not be a problem with too many open files.
I also reviewed filereader.py and filewriter.py and from what I can see, those functions open() and close() the files in the correct way. I added "del fp" in filereader.py and filewriter.py just after the close() there in order to find out whether brutally deleting the objects closes them on OS level, but I was not successful. The exception persists.
The problem occurs when I read and write many files. I have no real number, but from my tests it seems that if you read and write more than 2000 files, the exception is thrown. If I let my beast eat a smaller number of files, everything is fine.
It is of course possible that my own code using dicom is just brainless, but I am pretty sure that the methods are called correctly and in the right order.
Original issue: http://code.google.com/p/pydicom/issues/detail?id=68