pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
140 stars 43 forks source link

How to de-identify a `pydicom.Dataset`? - addition of example to docs #210

Closed fcossio closed 2 years ago

fcossio commented 2 years ago

Hi, I have implemented my own logic to load a pydicom.Dataset instance from a database. I would like to de-identify the instance without having to write it as a file and then read it with deid.

Is there anything similar to

def replace_identifiers(recipe, dataset: pydicom.dataset.Dataset) -> pydicom.dataset.Dataset:
    """de-identify a single pydicom.dataset.Dataset instance"""
    ...

?

vsoch commented 2 years ago

Aside from adding typing to deid here, you should be able to do:

if isinstance(dataset, pydicom.dataset.Dataset):
    replace_identifiers(...)
vsoch commented 2 years ago

Also, typing in and of itself doesn't prevent you from providing the wrong type! E.g.,:

In [1]: def func(name: str):
   ...:     print(name)
   ...: 

In [2]: func(1)
1
fcossio commented 2 years ago

After some more digging through the documentation, I solved my problem with the following:

class DeidDataset:
    def __init__(self, recipe_path: str = None):
        """Deidentify datasets according to vaib recipe

        :param recipe_path: path to the deid recipe
        """
        if recipe_path == None:
            logging.warning(f"DeidDataset using default recipe {default_recipe_path}")
            recipe_path = default_recipe_path
        self.recipe = DeidRecipe(recipe_path)

    def anonymize(self, dataset:pydicom.Dataset) -> pydicom.Dataset:
        """Anonymize a single dicom dataset

        :param dataset: dataset that will be anonymized
        :returns: anonymized dataset
        """
        parser = DicomParser(dataset, self.recipe)
        parser.define('remove_day', self.remove_day)
        parser.define('round_AS_to_nearest_5y', self.round_AS_to_nearest_5y)
        parser.define('round_DS_to_nearest_5', self.round_DS_to_nearest_5)
        parser.define('round_DS_to_nearest_0_05', self.round_DS_to_nearest_0_05)
        parser.parse(strip_sequences=True, remove_private=True)
        return parser.dicom
    ...

Thanks for making this tool available.

vsoch commented 2 years ago

oh that's fantastic! Do you mind if I include with our docs somewhere as an example? Even if we create a gist and then link, I think it might be super helpful for future users.

fcossio commented 2 years ago

Of course! I will be OOO for the next two weeks. If you can wait that time, I will make a proper PR afterwards adding the example to the docs.

fcossio commented 2 years ago

Actually, I just found that this only works for files, there are two lines that must be silenced in order for it to work with a dataset that doesn't come from a file:

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/parser.py#L114

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/parser.py#L115

and the file meta here:

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/fields.py#L254

vsoch commented 2 years ago

Yes of course! When you are back ping me I’d you have questions or want any help.

fcossio commented 2 years ago

I'm back 😄

I will need to expose an argument to be able to silence these two lines.

Actually, I just found that this only works for files, there are two lines that must be silenced in order for it to work with a dataset that doesn't come from a file:

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/parser.py#L114

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/parser.py#L115

I propose adding a boolean from_file argument to the __init__ method of DicomParser and then using if-else statements to silence the lines accordingly.

For the DicomField part that needs to be silenced:

and the file meta here:

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/fields.py#L254

I can add the same argument to this method and skip the dicom.file_meta part accordingly.

https://github.com/pydicom/deid/blob/0807f20bfc36b1f30828ed562c7f79e14b5f6100/deid/dicom/fields.py#L240

To keep the interface intact, the default value for the proposed arguments would be True and only if it is necessary, the user could set it to False when needed.

Does this sound good to you @vsoch ?

vsoch commented 2 years ago

A dataset that doesn’t come from a file - what would it be?

fcossio commented 2 years ago

I am loading a dataset that was stored in a database as json. Therefore it contains no filepath or file_meta.

vsoch commented 2 years ago

Gotcha, ok just make functions to derive both of those items then, and if you cannot set to None, and make sure places that use them also respond appropriately.