suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

Need option for trusting (0002, 0000) Group Length or better heuristic than not_group2 #139

Open suever opened 9 years ago

suever commented 9 years ago

From maddin@gmail.com on February 20, 2014 09:47:56

I have some dicom files that cannot be read by pydicom 0.9.8 due to the introduced heuristic for handling improper group lengths. The header of the file looks like this:

Dicom-File-Format

Dicom-Meta-Information-Header

Used TransferSyntax: LittleEndianExplicit

(0002,0000) UL 202 # 4, 1 MetaElementGroupLength (0002,0001) OB 00\01 # 2, 1 FileMetaInformationVersion (0002,0002) UI =f # 30, 1 MediaStorageSOPClassUID (0002,0003) UI [1.2.826.0.1.3680043.8.760.7.1553278726.1392203754637.31] # 56, 1 MediaStorageSOPInstanceUID (0002,0010) UI =LittleEndianImplicit # 18, 1 TransferSyntaxUID (0002,0012) UI [1.2.276.0.7230010.3.0.3.6.1] # 28, 1 ImplementationClassUID (0002,0013) SH [OFFIS_DCMTK_361] # 16, 1 ImplementationVersionName

Dicom-Data-Set

Used TransferSyntax: LittleEndianImplicit

(0002,0003) UI [1.2.826.XXXX] # 56, 1 MediaStorageSOPInstanceUID (0008,0005) CS [ISO_IR 100] # 10, 1 SpecificCharacterSet (0008,0008) CS [ORIGINAL\PRIMARY\TOMO_PROJ\RIGHT] # 32, 4 ImageType (0008,0016) UI =DigitalMammographyXRayImageStorageForProcessing # 30, 1 SOPClassUID

using pydicom out of the box it returns a single tag: (2e31, 2e32) Private tag data OB: Array of 20188336 bytes

interpreting the start of MediaStorageSOPInstanceUID as a group, element identifier ("1.2." -> (2e31, 2e32)). And I do get the warning:

  logger.info("*** Group length for file meta dataset "
                        "did not match end of group 2 data ***")

when debugging.

In filereader l. 463 fp_now is 354, but expected_ds_start is 346. If I manually set the filepointer to fp.seek(346), the file is read correctly. If not you see the above erroneous result.

If I understand this correctly then the issue arises because the dicom-meta information is in explicit VR wheras here the actual data-set is implicit VR. Since the producer considers (0002, 0003) as part of the dataset, it is in implicit VR. Pydicom however thinks due to the not_group2 heuristic, that is is part of the dicom-meta-header in explicit VR.

Original issue: http://code.google.com/p/pydicom/issues/detail?id=139

suever commented 9 years ago

From maddin@gmail.com on February 20, 2014 07:31:00

I created a small patch that solves it for me with a new heuristic.

Attachment: not_group2.patch

suever commented 9 years ago

From darcymason@gmail.com on February 23, 2014 13:43:45

Thanks for the detailed investigation and the patch. I'm fairly sure these files are not compliant with the DICOM standard (for one thing tags must be in numeric order, but perhaps that applies to the file meta and the main dataset separately). I feel inclined to solve this by offering the option to trust the group length as mentioned in you issue subject line. I'll give it some more thought and try to put something in place for a near version.

Out of curiosity, the second MediaStorageSOPInstanceUID you show has '1.2.826.XXXX'. Is that really what the value is or is that some kind of shorthand used in the display? If the former, then it would look like the creating program didn't clean up a placeholder of some kind.