pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
138 stars 43 forks source link

US data corrupted by pixel anonymization #228

Closed timothee-l closed 1 year ago

timothee-l commented 1 year ago

Input is Ultrasound multiframes, decompressed (LittleEndianExplicit), RGB encoded. Output is a scrambled image.

Instructions are very simple:

                client = DicomCleaner(output_folder=output_folder + sub_path, deid='config.txt')
                client.detect(path + '\\' + file)
                client.clean()
                client.save_dicom()

And my config file is also very basic

FORMAT dicom

%filter graylist

LABEL Philips Ultrasound Header
    contains Manufacturer Philips
    + contains Modality US
    + contains ImageType Cardiology
    coordinates 0,0,1024,23

This is an input/output example. It is affecting all of the multiframes in the dataset: https://imgur.com/a/GF7ZO2w

vsoch commented 1 year ago

It looks like it's not cleaning the right dimension, or possibly saving incorrectly. I'm not sure we've ever done Ultrasound multiframes before (I'm not sure I've worked with them). Are you a Python developer and able to debug this and pull request?

wetzelj commented 1 year ago

Seeing this issue pop through reminded me of #166. I thought we handled this, but evidently not - or at least not the specific situation @timothee-l is encountering. I don't have the bandwidth to research - but thought it would be beneficial to add #166 into the discussion.

timothee-l commented 1 year ago

I can give it a try. To clarify, the tool has worked with US multiframes before - as long as they were RGB encoded (YBR also caused some sort of corruption).

vsoch commented 1 year ago

I think step one, either way, is getting a test dummy dataset to reproduce the issue. @timothee-l is this something you could provide? We have an external data repository now https://github.com/pydicom/deid-data

timothee-l commented 1 year ago

I can share the pixel data and other tags you may need - but not the whole dicom (I do not own the data)

vsoch commented 1 year ago

Could you maybe make an empty dicom of that type and add the pixel data and tags to it? That would work!

timothee-l commented 1 year ago

sample.zip

Here is a sample of the data I am trying to process. I will try the files on your data repo.

vsoch commented 1 year ago

Perfect! As long as I can reproduce your error, I should be able to debug and work on it.

vsoch commented 1 year ago

stupid question - do you provide this to deid as a .zip, or just the dicom on its own?

timothee-l commented 1 year ago

Just the dicom on its own

vsoch commented 1 year ago

okay I've reproduced! I've verified the format is:

and I've walked through the logic of clean. Nothing as jumping out at me as wrong, but indeed the image is mangled. @wetzelj do you have any ideas/ suggestions for what to try or look at, beyond the obvious?

wetzelj commented 1 year ago

I didn't dive into the code for this response, but did run a few tests.

My speculation at this point is that it could be something to do with the fact that this image has undergone lossy compression at some point it it's lifetime - in all of my use cases, we've always dealt with images that have not undergone lossy compression.

I would expect that this image must be decompressed to get back to a standard pixel array before we apply any sort of pixel masking rules. https://pydicom.github.io/pydicom/dev/old/image_data_handlers.html

vsoch commented 1 year ago

Tried that just now:

from pydicom import read_file
from deid.dicom.pixels import DicomCleaner
import os
here = os.getcwd()
output_folder = os.path.join(here, 'out')
file = "ultrasound-multiframe.dcm"

# Decompress first
dcm = read_file(file)
dcm.decompress()
file_decompressed = "ultrasound-multiframe-decompressed.dcm"
dcm.save_as(file_decompressed)

# Now clean
client = DicomCleaner(output_folder=output_folder, deid='config.txt')
client.detect(file_decompressed)
client.clean()
client.save_dicom()

Didn't seem to make a difference (still messed up!) but was a good idea to try.

timothee-l commented 1 year ago

There appears to be a mismatch between the pixel data (compressed) and the transfer syntax uid which says uncompressed, as you said. So yes the problems seem to originate from my data. I think the issue can be closed, Thanks!

vsoch commented 1 year ago

Sounds good thanks!