read_file() returns incomplete dataset for DICOM file with nested private sequences

GoogleCodeExporter commented 9 years ago

I have a DICOM file that contains a couple of private sequences of undefined 
length,  which themselves contain undefined length sequences nested within 
them. The transfer syntax is implicit VR. When I attempt to read this file, 
many of the data elements (including the pixel data) are missing from returned 
dataset.

Looking at the code, the problem appears to originate when 
data_element_generator() reaches a private sequence whose VR is unknown. Under 
this condition, the sequence is treated as binary data of undefined length and 
read_undefined_length_value() is called, which parses the file until a sequence 
delimiter tag is reached.  However, in the case of nested sequences, the next 
sequence delimiter to be reached corresponds to the end of the first nested 
sequence rather than that of the parent sequence.

As such, the parent sequence is only partially read, and the rest of the 
sequence is parsed as if it is the top-level dataset. When the parent’s 
actual sequence delimiter is reached it is detected by read_dataset(),  
ultimately causing read_file() to terminate early.

As a workaround, I’ve modified data_element_generator() to check all data 
elements of undefined length to see if they are sequences (based on the 
assumption that the first four bytes of an SQ data element value will be always 
be an Item Tag):

--- a/src/oxmorf/dicom/filereader.py
+++ b/src/oxmorf/dicom/filereader.py
@@ -247,7 +247,13 @@
                     VR = dictionaryVR(tag)
                 except KeyError: 
                     pass
-            if VR == 'SQ':
+
+            bytes = fp_read(4)
+            fp.seek(fp_tell()-4)
+            possible_group, possible_elem = unpack(endian_chr+"HH", bytes)
+            possible_item_tag = TupleTag((possible_group, possible_elem))
+                
+            if (VR == 'SQ') or (possible_item_tag == ItemTag):
                 if debugging:
                     logger_debug("%04x: Reading and parsing undefined length sequence"
                                 % fp_tell())

The hope is that this should prevent any sequences being read as binary data 
(it seems to work ok so far, although I've not properly tested it).

If needed, I should shortly be able to provide the DICOM file in question.

Many thanks,

David

Original issue reported on code.google.com by d.j.hun...@gmail.com on 29 Feb 2012 at 6:49

GoogleCodeExporter commented 9 years ago

Thanks for the detailed report, and yes, a file would be helpful (as always, 
with no confidential information of any kind).

I think I will leave this until after the 0.9.7 release (after which pydicom 
move towards python 3), and backport the solution to the python 2.x branch, 
with thorough testing in place.

Original comment by darcymason@gmail.com on 1 Mar 2012 at 3:31

Changed state: Accepted
Added labels: Milestone-Release1.0

GoogleCodeExporter commented 9 years ago

Here's the offending file.

David

Original comment by d.j.hun...@gmail.com on 5 Mar 2012 at 2:18

Attachments:

nestedPrivateSQ.dcm

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 84af4b240add.

Original comment by Suever@gmail.com on 13 Jun 2012 at 7:13

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

David,

I was able to come up with a patch based on your suggestion. The one 
modification I made was to set the VR to 'SQ' if an item tag was indeed found. 
This allows proper parsing of the file as well as proper formatting while 
printing the sequences.

Additionally, I created a very simple example file as well as a unittest.

-Suever

Original comment by Suever@gmail.com on 13 Jun 2012 at 7:15

thegooglecodearchive / pydicom

read_file() returns incomplete dataset for DICOM file with nested private sequences #113