pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
148 stars 44 forks source link

Ultrasound Deidentification - Pixel Masking Failure #165

Closed wetzelj closed 3 years ago

wetzelj commented 4 years ago

When deidentifying an ultrasound, an error was encountered when the ultrasound study contained single image RGB stills as well as RGB cine clips. When the study encountered one of these single-image stills, an exception occurred:

Exception: operands could not be broadcast together with shapes (768,768,1024) (768,1024,3)

This issue appears to be due to an error with clean.py line 223. Since the original shape of the image in this scenario is (X, Y, channel), when using the numpy.tile() function to "stack" the mask into all channels, the mask is incorrectly stacked to a shape of (X, X, Y)

I'm not 100% sure what the correct fix should be for this. If self.original.shape == 4, we have a RGB cine clip - the logic works appropriately. If self.original.shape == 2 we have a single-frame greyscale image - again, the logic works appropriately. However, if self.original.shape == 3, it appears that we are in an ambiguous state. From the comment on line 222, it appears that the intent - as coded - was that in this state we expected to have a greyscale cine clip - which would have the shape of (frames, X, Y). In my error case, however, the file being deidentified, was not a greyscale cine clip, but was instead a single-frame RGB image with the shape (X, Y, channel).

Instead of relying on the shape to indicate the action to take here, I wonder if we should look to the (0028,0002) SamplesPerPixel and (0028,0008) NumberOfFrames tags to determine how to appropriately build the 3 dimensional mask. I'd appreciate other thoughts on this. This is the first I've really had to dig into Ultrasounds.

vsoch commented 4 years ago

Oh interesting, I didn't know about the single frame RGB image. Looking at SamplesPerPixel it notes that:

For monochrome (gray scale) and palette color images, the number of planes is 1. For RGB and other three vector color models, the value of this Attribute is 3.

So would you want to add a check under original.shape == 3 to have different logic depending on this field? If you want to open a PR with this suggested change, it would be great - we want to also have a test case (with a de-identified image) that makes sure it continues to work.

wetzelj commented 4 years ago

Sounds good. I'll get started on something. It may take me a bit to get the sample image for upload created and approved.

wetzelj commented 4 years ago

If you'd like to take a look at what will be coming, it's committed here. I'm still waiting for a greyscale cine clip to test with and still waiting for images that I can contribute to the project. Once I have these, I'll commit and open the pull request.

wetzelj commented 3 years ago

Fixed with #166