suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

Heuristic for unknown transfer syntax #85

Open suever opened 9 years ago

suever commented 9 years ago

From sticktot...@googlemail.com on April 30, 2010 05:48:39

Hello,

first of all thanks for this module. I really appreciate the effort and enjoy working with it.

I encountered a problem reading a DICOM RT Ion Plan Storage file (UID 1.2.840.10008.5.1.4.1.1.481.8). I have a set of DICOM files as output from a treatment planning system. The files provided are the RT Dose Storage, CT Image Storage, RT Structure Storage and the above mentioned RT Ion Plan Storage. When reading for example the CT and dose data, everything works fine:

Python 2.6.5 ( r265 :79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import dicom dicom.debug()

ct = dicom.read_file("CT.dcm") Reading file 'CT.dcm' Reading preamble File is not a standard DICOM file; 'DICM' header is missing. Assuming no header and continuing 0008: (0008, 0005) None 000a 'ISO_IR 100' 001a: (0008, 0008) None 0018 'ORIGINAL\SECONDARY\AXIAL' 003a: (0008, 0012) None 0008 '20100428' ... 0072: (0008, 0016) None 001a '1.2.840.10008.5.1.4.1.1.2\x00' ...

dose = dicom.read_file("DOSE.dcm") Reading file 'DOSE.dcm' Reading preamble File is not a standard DICOM file; 'DICM' header is missing. Assuming no header and continuing 0008: (0008, 0005) None 000a 'ISO_IR 100' 001a: (0008, 0016) None 001e '1.2.840.10008.5.1.4.1.1.481.2\x00' ... 0082: (0008, 0050) None 0000 '' 008a: (0008, 0060) None 0006 'RTDOSE' ...

I noticed and also confirmed with a hex editor, that the files do not have a header, but due to the fact that it worked I do not think that is the issue. I tried to open the RTIPLAN and got the following output:

plan = dicom.read_file("RTIPLAN.dcm") Reading file 'RTIPLAN.dcm' Reading preamble File is not a standard DICOM file; 'DICM' header is missing. Assuming no header and continuing 0008: (0008, 0005) None a5343 'ISO_IR 100\x08\x00\x12\x00DA \x08\x0020100428\x08\x00\x13\x00'

data (0008, 0005) Specific Character Set CS: ['ISO_IR 100\x08\x00\x12\x00DA\x08\x0020100428\x08\x00\x13\x00TM \x06\x00142021\x08\x00\x14\x00UI\x12\x00 ...

It looks like the whole file is read into the first tag. Matlab was able to read the file correctly with the data I expected. A correct output with dcmdump (DCMTK, OFFIS) is:

Dicom-File-Format

Dicom-Meta-Information-Header

Used TransferSyntax: UnknownTransferSyntax

Dicom-Data-Set

Used TransferSyntax: LittleEndianExplicit

(0008,0005) CS [ISO_IR 100] # 10, 1 SpecificCharacterSet (0008,0012) DA [20100428] # 8, 1 InstanceCreationDate (0008,0013) TM [142021] # 6, 1 InstanceCreationTime ... (0008,0016) UI [1.2.840.10008.5.1.4.1.1.481.8] # 30, 1 SOPClassUID ...(0008,0020) DA (no value available) # 0, 0 StudyDate (0008,0030) TM (no value available) # 0, 0 StudyTime (0008,0050) SH [1] # 2, 1 AccessionNumber (0008,0060) CS [RTPLAN] # 6, 1 Modality ...

The hex output looks as follows:

00000000 08 00 05 00 43 53 0a 00 49 53 4f 5f 49 52 20 31
|....CS..ISO_IR 1| 00000010 30 30 08 00 12 00 44 41 08 00 32 30 31 30 30 34 | 00....DA..201004| 00000020 32 38 08 00 13 00 54 4d 06 00 31 34 32 30 32 31 | 28....TM..142021| 00000030 08 00 14 00 55 49 12 00 |....UI..

I do not know if anything similar is already known, or if the file is simply corrupted. But since the other files work. If it is necessary I can provide the DICOM files, but I need to anonymise them first.

Thanks in advance for looking into this problem.

Cheers, Andy

Original issue: http://code.google.com/p/pydicom/issues/detail?id=85

suever commented 9 years ago

From darcymason@gmail.com on April 30, 2010 06:35:09

I think I see the problem -- with no file meta info to give the transfer syntax, pydicom assumes implicit VR little endian, but that file is explicit VR (your matlab output indicates it used explicit VR, and the VR characters are there in the hex output).

So we don't need any example files, I can create a test case based on your hex output.

I've changed the title of this issue to reflect the more general problem.

As a temporary solution to read this RT Ion file, in filereader.py you could add an extra optional argument to read_file, passed along to read_partial, to tell the reader the file is explicit VR. I'll see if I can work up something like that to add to these functions. Even if there is a good heuristic, there may still be cases it doesn't handle properly, and the user should be able to force the correct one.

Summary: Heuristic for unknown transfer syntax
Status: Accepted
Labels: -Priority-Medium Priority-High

suever commented 9 years ago

From sticktot...@googlemail.com on April 30, 2010 06:53:17

Hi,

thanks for the quick answer. I was actually just reading up on implicit and explicit VR and can confirm your findings. All of the files in my output directory contain the DICOM data encoded with implicit VR, except for the one file with the RT Ion Plan (don't ask me why!).

I just edited my RT Ion Plan with a hex editor and deleted the first few VR's and made the encoding match implicit VR and I was able to read the first few elements with pydicom. Thanks for that. But yes, I agree. It would be nice to let the user choose the encoding when the transfer syntax is unknown.

I will contact the customer support and ask if there is a way to export the file meta info as well. This would probably be the most elegant way.

Thanks again!

Cheers, Andy