suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

Reading non-DICOM file gives MemoryError, other errors #70

Closed suever closed 9 years ago

suever commented 9 years ago

From NikitaTh...@gmail.com on December 16, 2009 09:18:15

I wrote code to call dicom.ReadFile() on every file in a directory of mostly DICOM files. By mistake I didn't filter out the readme.txt (attached) that was in that directory, and pydicom failed rather spectacularly --

$ python Python 2.5.1 ( r251 :54863, Nov 17 2007, 21:19:53) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information.

import dicom dicom.UID <module 'dicom.UID' from '/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/UID.pyc'> dicom.ReadFile("readme.txt") Python(19140) malloc: * mmap(size=1679847424) failed (error code=12) * error: can't allocate region *\ set a breakpoint in malloc_error_break to debug Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/filereader.py", line 462, in read_file ds = read_dataset(fp) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/filereader.py", line 203, in read_dataset data_element = read_data_element(fp) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/filereader.py", line 184, in read_data_element value = reader(fp, length, num_format) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/filereader.py", line 67, in read_OBvalue data = fp.read(length) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/pydicom-0.9.3-py2.5.egg/dicom/filebase.py", line 63, in read bytes += parent_read(length-len(bytes)) MemoryError

I think this is just bad luck as I've tried feeding it other non-DICOM files didn't see any malloc errors.

Obviously I don't expect pydicom to make sense of non-DICOM files, but the result above makes me a little nervous that my interpreter is left in an unstable state.

In general (and maybe this should be a separate issue), I'd like to see pydicom report a single error (e.g. dicom.ThisIsNotADicomFileYouKnuckleheadError) to make try/excepts around ReadFile() a little cleaner.

I'm a very new user to pydicom so maybe this latter point is resolved some other way.

Thanks

Attachment: readme.txt

Original issue: http://code.google.com/p/pydicom/issues/detail?id=70

suever commented 9 years ago

From darcymason@gmail.com on December 20, 2009 10:50:48

Thanks for entering this issue, and for providing a file -- I was able to replicate the Memory Error on my system. The "is d" near the start of the second line is taken as the length of a 'data element' to read, which translates (little endian) to almost 2 GB. The read() function doesn't seem to like anything that large.

The pydicom read_file() function used to take an optional argument 'has_header', and I removed that at some point (can't remember why). It's purpose was (if set to False) to force pydicom to read a file even though the 'DICM' marker wasn't in the file at the correct position. I think this idea should be brought back, and in addition to that, even when forced there should be some heuristics (as mentioned in issue 24 ). There is some discussion and code at [1] on how to approach this. But I'm wondering if something as simple as looking for the patient id tag (are all DICOM files guaranteed to have this?) might work.

[1] http://www.dclunie.com/medical-image- faq/html/part2.html#DICOMTransferSyntaxDetermination

Status: Accepted
Labels: -Priority-Medium Priority-High Milestone-NextRelease

suever commented 9 years ago

From NikitaTh...@gmail.com on December 21, 2009 07:36:59

I think that just adding a simple test for syntactic validity would go a long way. It was one of the first things I added to my code. Something like this --

def is_dicom(filename): """Returns True if the file in question is a DICOM file, else False. """

Per the DICOM specs, a DICOM file starts with 128 reserved bytes

# followed by "DICM".
# ref: http://dicom.nema.org/ f = open(filename, "rb")
s = f.read(132)
f.close()
return s.endswith("DICM")

Or is that too naive? Do DICOM files regularly violate that syntax?

If that code would cover most cases, it'd be useful to see pydicom integrate that and raise a custom error if it fails.

It doesn't preclude adding a 'force' param to read_file(); if reading a file raises an error the caller would be free to call read_file(force=True) and deal with the consequences.

suever commented 9 years ago

From darcymason@gmail.com on February 26, 2010 07:28:06

This issue was closed by revision a7c9dce062 .

Status: Fixed