pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
140 stars 43 forks source link

Discussion: Add class to handle get/set #119

Closed vsoch closed 4 years ago

vsoch commented 4 years ago

Right now, the user is required to run the get and replace identifiers functions separately. Between these actions, there is a step to add any variables or functions that are needed. My thinking is that we might be able to provide an easier to use interface with a class. It might look something like:

from deid.dicom import HeaderParser
parser = HeaderParser("dicom.deid")

# Add a function to some lookup to be used for replace
# name would be optional (for the key) and default to the function's name
parser.add_func(func=replace_uid, name="replace_uid")

dicom = pydicom.read_file(dicom_file)

for dicom in dicom_files:
    parser.run(dicom)

# And run would correspond to running get() and set() for get and replace identifiers, respectively.

def run(self, dicom):
    self.get(dicom)
    self.replace(dicom)

of course with many more variables than I'm adding here. and careful thought about how to store the fields, functions, added variables, etc.

vsoch commented 4 years ago

@wetzelj using iterall might be useful instead of dicom_dir().

vsoch commented 4 years ago

Here are some example of iterating through things. When we use iterall we get a DataElement each time, all of which have a name:

(0008, 0100) Code Value                          SH: 'T-D3000'
(0008, 0100) Code Value                          SH: 'R-10206'

In [111]: for element in dicom.iterall(): 
     ...:     if  "Code Value" in element.name: 
     ...:         print(element) 

but not all of them have what we traditionally use, keywords (e.g. PatientID or PixelData). The above elements (unwrapped from sequences) do not.

In [111]: for element in dicom.iterall(): 
     ...:     if not element.keyword: 
     ...:         print(element) 
     ...:                                                                                                                                                                
(0011, 0003) Private Creator                     AE: 'Agfa DR'
(0019, 0010) Private Creator                     LO: 'Agfa ADC NX'
(0019, 1007) Private tag data                    CS: 'YES'
(0019, 1021) Private tag data                    FL: 6.039999961853027
(0019, 1028) Private tag data                    CS: 'NO'
(0019, 1030) Private tag data                    LT: ''
(0019, 10f5) [Cassette Orientation]              CS: 'LANDSCAPE'
(0019, 10fa) Private tag data                    IS: "297"
(0019, 10fb) Private tag data                    FL: 2.4000000953674316
(0019, 10fc) Private tag data                    IS: "171"
(0019, 10fd) Private tag data                    CS: 'NO'
(0019, 10fe) [Unknown]                           CS: 'MED'

So my next question is if I loop through the elements once, and I make a change to the flattened item, does it update the dicom? Let's try changing one:

for element in dicom.iterall(): 
     ...:     if "Patient's Name" in element.name: 
     ...:         print(element) 
     ...:          
     ...:                                                                                                                                                                                                                        
(0010, 0010) Patient's Name                      PN: 'Wetzel, James^Chase'

okay let's try changing that.

 for element in dicom.iterall(): 
     ...:     if "Patient's Name" in element.name: 
     ...:         element.value = "Dinosaur, Pancakes^Great"

Did it change?

 for element in dicom.iterall(): 
     ...:     if "Patient's Name" in element.name: 
     ...:         print(element) 
     ...:          
     ...:                                                                                                                                                                                                                        
(0010, 0010) Patient's Name                      PN: 'Dinosaur, Pancakes^Great'

Yes, great! So what if we parsed over all fields once to extract any sort of lookup that we need, but then once more to do changes with iterall. The harder bit here is how we would refer to specific values - right now we allow for tags (as strings), keywords (also strings) but we don't do any search or parsing of the element.name field, which could be meaningful.

vsoch commented 4 years ago

The contender.keyword won't work either as an index, there are several that are empty:

In [229]:     # Includes private tags, sequences flattened, non-null values 
     ...:     for contender in dicom.iterall(): 
     ...:         if contender.keyword == '': 
     ...:          print(contender) 
     ...:                                                                                                                            
(0011, 0003) Private Creator                     AE: 'Agfa DR'
(0019, 0010) Private Creator                     LO: 'Agfa ADC NX'
(0019, 1007) Private tag data                    CS: 'YES'
(0019, 1021) Private tag data                    FL: 6.039999961853027
(0019, 1028) Private tag data                    CS: 'NO'
(0019, 1030) Private tag data                    LT: ''
(0019, 10f5) [Cassette Orientation]              CS: 'LANDSCAPE'
(0019, 10fa) Private tag data                    IS: "297"
(0019, 10fb) Private tag data                    FL: 2.4000000953674316
(0019, 10fc) Private tag data                    IS: "171"
(0019, 10fd) Private tag data                    CS: 'NO'
(0019, 10fe) [Unknown]                           CS: 'MED'
vsoch commented 4 years ago

This is now handled with the DicomParser class, e.g., https://pydicom.github.io/deid/getting-started/dicom-put/