pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
140 stars 43 forks source link

Inconsistency in the application of multiple recipe rules on the same field #140

Closed wetzelj closed 4 years ago

wetzelj commented 4 years ago

After the parser update, there is an inconsistency in the manner in which multiple recipe rules applied to the same field.

Prior to the update, multiple rules were additive.
Example 1:

JITTER StudyDate 1
JITTER StudyDate 2

Results in 3 days being added to StudyDate.

Example 2:

ADD PatientIdentityRemoved Yes
REMOVE PatientIdentityRemoved 

Results in no change to the header.

Example 3:

REMOVE StudyDate 
ADD StudyDate 20200806

Resulted in StudyDate having a value of 20200806.

After the parser update, any actions following a REMOVE rule are ignored. As a result, while Examples 1 and 2 still operate the same, Example 3 fails to add the new StudyDate to the header.

I believe that this is occurring because the REMOVE action (specifically parser.py - delete_field()) removes the specified field from the FileDataSet in the parser.dicom property but is not removing the item from the fields property. In subsequent ADD rules, we fall into the add_field function and since the deleted field is still in parser.fields the add is treated as a replacement (line 377) which updates the element, but the tag with the new value is never re-added to the parser.dicom FileDataSet.

I have a few unit tests written for the examples that I've outlined above - but haven't implemented a solution yet (the new tests can be seen in this commit). I wanted to get your perspective first. My first thought was to ensure that the REMOVE action removed the element from the fields property in addition to what it currently does.

While all of the above may seem like nonsense - "Why wouldn't you just compress each of these scenarios down to three single rules?" - the REMOVE/ADD scenario does have real-world applicability for situations like:

REMOVE endswith:ID
ADD PatientID 12345
vsoch commented 4 years ago

I think you are spot on - we'd want the field to not be found, and then have it added here so it does seem like a bug. So see if you are able to remove the field also from self.fields to address the issue! And very good debugging, as always. :)

wetzelj commented 4 years ago

Before I submit the pull request, could you take a look at this commit?

The removal of the value from fields introduced a bug detected by the test_remove_all_func unit test. The removal of the entry from fields caused a runtime error - "dictionary changed size during iterations". To correct this, I created a clone of the fields dict which we then iterate over. Is there a better way to do this?

vsoch commented 4 years ago

That would work! I've typically seen that done more like:

from copy import deepcopy
temp_fields = deepcopy(fields)

but if you are sure just doing dict(fields) would work (for all python versions) then I'm good with that. You could also do:

for uid, field in fields.copy().items():
    ....

or even

for uid in fields.keys():
    field = fields[uid]
    ...

As with Python, there are always many ways to do things! I usually do the first (with deepcopy) because I know some weirdness can result with deeply nested dictionaries.

wetzelj commented 4 years ago

I like typical. :) It's switched to deepcopy.

vsoch commented 4 years ago

Fixed with #141