Closed rmw362 closed 3 years ago
Thank you for pointing this out! I would definitely be open to a pull request if you'd like to make these needed changes (which you've already figured out and should be given credit for). It would also be good to have a test you think appropriate to ensure that the error does not re-occur.
I would be happy to submit a PR (when I have a little bit more time later this week :) ). How would you like to handle the second issue regarding coordinates of PHI? Certainly changing the behavior of the mask is a bigger change (ie mask becomes 0s, everything else 1s) and would require you to remove the SequenceofUltrasoundRegions filter from the default deid.dicom file. Otherwise, someone would need to meticulously go through the list of filters and change them to include the inverse of the coordinates currently provided (that is if all of the provided coordinates come from CTP's PixelAnonymizer script, I only confirmed this behavior for US). If the second approach is desired, not sure I have the time to go through each of these and update (sorry!).
Great! Definitely no worries on a quick PR, I'll be here when you need me to review!
For the second issue, the change was a response to a user that found this coordinate for Ultrasound data and thought it would be useful to add. I think the original intention was to get a box (the values) and then set the values to 0 within those bounds (see my notes here https://github.com/pydicom/deid/blob/master/deid/dicom/pixels/detect.py#L275) but then you are right, we use the coordinate here https://github.com/pydicom/deid/blob/master/deid/dicom/pixels/clean.py#L164 and create a mask with 1s, and then multiply, so the region that we want blacked out is the one included. If it's the case that the CTP coordinates are the opposite of the UltrasoundRegions (e.g., CTP tells us what to black out, ultrasound regions tells us what to include) then I think a reasonable approach would be to allow for a special case of cleaning, a setting to indicate "clean the inverse of this" (or basically set the value of the mask to 1). Something like:
LABEL Clean Ultrasound Regions
present SequenceOfUltrasoundRegions
keepcoordinates from:SequenceOfUltrasoundRegions
So then here we would just do another check:
elif member.lower().startswith("keepcoordinates"):
coordinate = member.replace("keepcoordinates", "").strip()
criteria["keepcoordinates"].append(coordinate)
continue
And then when we are doing the clean, we would first create a tuple of coordinates (instead of a list) where the first entry is the coordinate, and the second is the value to fill with (0 or 1 depending on keep or not). That would be here. And then we would keep the current loop, but instead of setting the value of the mask to 1 always, we would set it based on the tuple value.
What do you think? If you like, I can do this first round of changes to add the keepcoordinates, and then I'll give you a branch to PR your first change to. That should run the tests across both of our changes. Would that work?
okay here you go! https://github.com/pydicom/deid/pull/143 Let me know what you think! I can either merge this (and then you can open the other PR separately) or you can PR to this branch if you want to tweak the logic. I'm good with either :)
hey @rmw362 I'd like to get the fix in asap for the coordinates being reversed - did you see https://github.com/pydicom/deid/pull/143, and would you care to test it? And I'd also appreciate a PR with the first change that you suggested. Thank you!
I've merged the fix for the keepcoordinates, I don't see the tab that you are referencing in the above huge snippet so I'll wait for your PR (or just showing me so I can make the fix). Thanks!
Hey @vsoch & @rmw362 - I came across this bug in my testing as well. However, I believe that the root cause of the issue is where the group_flags and group_descriptions are being initialized to []. Currently this is occurring on line 146/147 in detect.py - within the loop in which we're looping through the conditions set for the filter. This causes only the last definition line of the filter to be used as the only definition for the filter. I believe that the correct fix for this issue is to simply move these two statements above the for group in item["filters"]
on line 145 so that the filter rules are appended and evaluated as a group.
I've written several tests evaluating filter conditions with multiple rules as well as filter conditions with inner_operators - so I think the tests cover all of the basic use scenarios.
Please take a look at my branch - fix/compound_filters. If you think this looks okay, I'll submit as a PR.
@wetzelj looks good! Yes please open the PR!
Should be closed by #152. Thank you @wetzelj !
Hi,
I really appreciate all of your efforts on this project...super helpful. I came across some behavior of the detect.py module that I believe is counter to how you intended the module to work.
First, the logic of how the detect.py module flags images is incorrect, and this is due to improper indentation in the evaluate groups function starting with "#At the end of the group, evaluate the inner group" below. This section needs to be indented to be included with the preceeding for loop (as I have done below), otherwise it is evaluating outer groups rather than inner groups (this is the current functionality as written). The result is inappropriate behavior where if the header meets the final criteria in the outer groups, it will be flagged regardless of whether it met criteria for previous groups. For example, I have an US DICOM which has 768 rows in the pixel data, and the image was flagged with every filter with equals Rows 768 as the last criteria. Thankfully a simple tab solves the issue. Additionally, you need to rename the output as overall_result to avoid a namespace conflict with results so that this solution works.
Second, in the default deid.dicom file, the coordinates (at least for US filters) are set to the same as those included in the PixelAnonymizerScript from CTP. The problem is in CTP these are the pixels that are meant to be removed, however, in deid these are the pixels that form a mask of what to KEEP in the images (coordinates are set to 1 in the mask, all else 0, then mask is multiplied by original image). The result is that the bounding boxes of PHI determined by the detect client are kept rather than discarded. For example if you look at the filter below, these are the coordinates x0=0, y0=0, x1 = 1024, y1=60 that represent the bounding box that contains PHI and should be removed, not kept:
LABEL SIEMMENS # (CTP) contains Modality US
Either the default recipe needs to be changed, or the behavior of the mask needs to be changed (i.e. mask represents pixels to be removed, though this would result in erratic behavior for the SequenceofUltrasoundRegions filter as it would remove the actual ultrasound image every time it is present).