suyashdb / pydicom

Automatically exported from code.google.com/p/pydicom
0 stars 0 forks source link

Reading non-DICOM file gives MemoryError, other errors #70

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I wrote code to call dicom.ReadFile() on every file in a directory of
mostly DICOM files. By mistake I didn't filter out the readme.txt
(attached) that was in that directory, and pydicom failed rather
spectacularly --

$ python
Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dicom
>>> dicom.UID
<module 'dicom.UID' from
'/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/UID.pyc'>
>>> dicom.ReadFile("readme.txt")
Python(19140) malloc: *** mmap(size=1679847424) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/filereader.py",
line 462, in read_file
    ds = read_dataset(fp)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/filereader.py",
line 203, in read_dataset
    data_element = read_data_element(fp)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/filereader.py",
line 184, in read_data_element
    value = reader(fp, length, num_format)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/filereader.py",
line 67, in read_OBvalue
    data = fp.read(length)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/p
ydicom-0.9.3-py2.5.egg/dicom/filebase.py",
line 63, in read
    bytes += parent_read(length-len(bytes))
MemoryError
>>> 

I think this is just bad luck as I've tried feeding it other non-DICOM
files didn't see any malloc errors. 

Obviously I don't expect pydicom to make sense of non-DICOM files, but the
result above makes me a little nervous that my interpreter is left in an
unstable state.

In general (and maybe this should be a separate issue), I'd like to see
pydicom report a single error (e.g.
dicom.ThisIsNotADicomFileYouKnuckleheadError) to make try/excepts around
ReadFile() a little cleaner.

I'm a very new user to pydicom so maybe this latter point is resolved some
other way.

Thanks

Original issue reported on code.google.com by NikitaTh...@gmail.com on 16 Dec 2009 at 2:18

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for entering this issue, and for providing a file -- I was able to 
replicate 
the Memory Error on my system. The "is d" near the start of the second line is 
taken 
as the length of a 'data element' to read, which translates (little endian) to 
almost 
2 GB. The read(<size>) function doesn't seem to like anything that large.

The pydicom read_file() function used to take an optional argument 
'has_header', and 
I removed that at some point (can't remember why). It's purpose was (if set to 
False) 
to _force_ pydicom to read a file even though the 'DICM' marker wasn't in the 
file at 
the correct position. I think this idea should be brought back, and in addition 
to 
that, even when forced there should be some heuristics (as mentioned in issue 
24). 
There is some discussion and code at [1] on how to approach this. But I'm 
wondering 
if something as simple as looking for the patient id tag (are all DICOM files 
guaranteed to have this?) might work.

[1] http://www.dclunie.com/medical-image-
faq/html/part2.html#DICOMTransferSyntaxDetermination

Original comment by darcymason@gmail.com on 20 Dec 2009 at 6:50

GoogleCodeExporter commented 8 years ago
I think that just adding a simple test for syntactic validity would go a long 
way. It
was one of the first things I added to my code. Something like this -- 

def is_dicom(filename):
    """Returns True if the file in question is a DICOM file, else False. """
    # Per the DICOM specs, a DICOM file starts with 128 reserved bytes
    # followed by "DICM".
    # ref: http://dicom.nema.org/
    f = open(filename, "rb")
    s = f.read(132)
    f.close()
    return s.endswith("DICM")

Or is that too naive? Do DICOM files regularly violate that syntax?

If that code would cover most cases, it'd be useful to see pydicom integrate 
that and
raise a custom error if it fails. 

It doesn't preclude adding a 'force' param to read_file(); if reading a file 
raises
an error the caller would be free to call read_file(force=True) and deal with 
the
consequences.

Original comment by NikitaTh...@gmail.com on 21 Dec 2009 at 3:36

GoogleCodeExporter commented 8 years ago
This issue was closed by revision a7c9dce062.

Original comment by darcymason@gmail.com on 26 Feb 2010 at 3:28