Closed fcossio closed 2 years ago
This is the example that I have added:
In this example we will create a custom class to deidentify a single instance of a pydicom.Dataset
with a custom recipe.
We will use four files for this example:
my_deid_example
├── my_deid_recipe.dicom
├── my_dicom_file.json
├── my_module.py
└── requirements.txt
The requirements.txt
file is used only to be able to run this example.
deid
pydicom
pycryptodome
We can install them by running the following commands (requires conda
)
conda create -n deid_example python=3.9
conda activate deid_example
cd my_deid_example
pip install -r requirements.txt
The contents of my_dicom_file.json
are used to load a pydicom.Dataset instance.
{
"SpecificCharacterSet":{"vr":"CS","Value":["ISO_IR 100"]},
"ImageType":{"vr":"CS","Value":["DERIVED","PRIMARY"]},
"SOPClassUID":{"vr":"UI","Value":["1.2.840.10008.5.1.4.1.1.1.2"]},
"StudyDate":{"vr":"DA","Value":["20220627"]},
"SeriesDate":{"vr":"DA","Value":["20220627"]},
"AcquisitionDate":{"vr":"DA","Value":["20220627"]},
"ContentDate":{"vr":"DA","Value":["20220627"]},
"StudyTime":{"vr":"TM","Value":["080803"]},
"ContentTime":{"vr":"TM","Value":["080808.202000"]},
"PatientName":{"vr":"PN","Value":[{"Alphabetic":"Maria^Doe"}]},
"PatientID":{"vr":"LO","Value":["1234567890"]},
"PatientBirthDate":{"vr":"DA","Value":["19900606"]},
"Modality":{"vr":"CS","Value":["MG"]},
"PatientSex":{"vr":"CS","Value":["F"]},
"PatientAge":{"vr":"AS","Value":["032Y"]},
"StudyID":{"vr":"SH","Value":["mammogram87654"]}
}
We create a custom recipe my_deid_recipe.dicom
that specifies what we want to do.
FORMAT dicom
%header
ADD PatientIdentityRemoved Yes
ADD DeidentificationMethod my_deid_recipe.dicom.v1.0
# Specify what we want to keep
KEEP ContentDate
KEEP StudyDate
# Replacements with custom functions. Those are registered in my_module.py
REPLACE PatientName func:replace_name
REPLACE AccessionNumber func:hash_func
REPLACE AdmissionID func:hash_func
REPLACE InterpretationID func:hash_func
REPLACE PatientBirthDate func:remove_day
REPLACE PatientID func:hash_func
REPLACE PerformedProcedureStepID func:hash_func
REPLACE PerformingPhysicianName func:hash_func
REPLACE RequestedProcedureID func:hash_func
REPLACE ResultsID func:hash_func
REPLACE StudyID func:hash_func
# Tags that require custom regex expressions
# Curve Data"(50xx,xxxx)"
REMOVE contains:^50.{6}$
# Overlay comments and data (60xx[34]000)
REMOVE contains:^60.{2}[34]000$
# Private tags ggggeeee where gggg is odd
REMOVE contains:^.{3}[13579].{4}$
# Blank the other tags
BLANK PatientWeight
BLANK PatientSize
REMOVE PatientAge
REMOVE SeriesDate
REMOVE AcquisitionDate
REMOVE StudyTime
REMOVE ContentTime
REMOVE PatientAge
REMOVE PatientSex
# ... etc
from deid.config import DeidRecipe
from deid.dicom.parser import DicomParser
import pydicom
from Crypto.Hash import SHA512
from datetime import datetime
class DeidDataset:
"""This class allows to pseudonymize an instance of
pydicom.Dataset with our custom recipe and functions.
"""
def __init__(self, secret_salt: str, recipe_path: str):
"""New instance of our pseudonymizer class.
:param secret_salt: a random string that makes the
hashing harder to break.
:param recipe_path: path to our deid recipe.
"""
self.secret_salt = secret_salt
self.recipe = DeidRecipe(recipe_path)
def pseudonymize(self, dataset:pydicom.Dataset) -> pydicom.Dataset:
"""Pseudonymize a single dicom dataset
:param dataset: dataset that will be pseudonymized
:returns: pseudonymized dataset
"""
parser = DicomParser(dataset, self.recipe, from_file=False)
# register functions that are specified in the recipe
parser.define('replace_name', self.replace_name)
parser.define('hash_func', self.deid_hash_func)
parser.define('remove_day', self.remove_day)
# parse the dataset and apply the deidentification
parser.parse(strip_sequences=True, remove_private=True)
return parser.dicom
# All registered functions that are used in the recipe must
# receive the arguments: `item`, `value`, `field`, `dicom`
def deid_hash_func(self, item, value, field, dicom) -> str:
"""Performs self.hash to field.element.value"""
val = field.element.value
return self.hash(str(val))
@staticmethod
def remove_day(item, value, field, dicom) -> str:
"""Removes the day from a DT field in the deid framework"""
dt = datetime.strptime(field.element.value, '%Y%m%d')
return dt.strftime("%Y%m01")
@staticmethod
def replace_name(item, value, field, dicom) -> str:
"""Replace PatientName with PatientSex and coarse PatientAge"""
sex = dicom.get('PatientSex')
sex = {"F":'Female', "M": 'Male', 'O':'Other'}[sex]
age = DeidDataset.round_to_nearest(int(dicom.get('PatientAge')[:-1]), 5)
return f"{sex} {age:03d}Y {dicom.get('Modality')}"
# Helper methods for our registered ones
@staticmethod
def round_to_nearest(value, interval):
"""Rounds value to closest multiple of interval"""
return interval * round(value/interval)
def hash(self, msg: str) -> str:
"""
:param msg: message that we want to encrypt,
normally the PatientID or the StudyID.
:return: the encrypted message as hexdigest
(in characters from '0' to '9' and 'a' to 'f')
"""
assert type(msg) == str, f"value is not of type str, {type(msg)}"
h = SHA512.new(truncate="256")
bytes_str = bytes(f"{self.secret_salt}{msg}", "utf-8")
h.update(bytes_str)
return str(h.hexdigest())
# Load the pydicom Dataset
import json
# Unorthodox way of loading a pydicom.Dataset
# please see pydicom documentation for more information
# on how to load dicom files
with open('my_dicom_file.json') as f:
dataset_dict = json.load(f)
dataset = pydicom.Dataset.from_json(dataset_dict)
print('Dataset before pseudonymization')
print(dataset)
#create an instance of our class
deid_ds = DeidDataset("!2#4%6&7abc", 'my_deid_recipe.dicom')
#pseudonymize the dataset
print('\nDataset after pseudonymization')
pseudonymized = deid_ds.pseudonymize(dataset)
print(pseudonymized)
If we execute our python module
python my_module.py
It will give us the following output:
Dataset before pseudonymization
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0008) Image Type CS: ['DERIVED', 'PRIMARY']
(0008, 0016) SOP Class UID UI: Digital Mammography X-Ray Image Storage - For Presentation
(0008, 0020) Study Date DA: '20220627'
(0008, 0021) Series Date DA: '20220627'
(0008, 0022) Acquisition Date DA: '20220627'
(0008, 0023) Content Date DA: '20220627'
(0008, 0030) Study Time TM: '080803'
(0008, 0033) Content Time TM: '080808.202000'
(0008, 0060) Modality CS: 'MG'
(0010, 0010) Patient's Name PN: 'Maria^Doe'
(0010, 0020) Patient ID LO: '1234567890'
(0010, 0030) Patient's Birth Date DA: '19900606'
(0010, 0040) Patient's Sex CS: 'F'
(0010, 1010) Patient's Age AS: '032Y'
(0020, 0010) Study ID SH: 'mammogram87654'
Dataset after pseudonymization
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0008, 0008) Image Type CS: ['DERIVED', 'PRIMARY']
(0008, 0016) SOP Class UID UI: Digital Mammography X-Ray Image Storage - For Presentation
(0008, 0020) Study Date DA: '20220627'
(0008, 0023) Content Date DA: '20220627'
(0008, 0060) Modality CS: 'MG'
(0010, 0010) Patient's Name PN: 'Female 030Y MG'
(0010, 0020) Patient ID LO: 'df65775690879c36437ae950c52d025102a1f9b8c8132f8b017f14e9ec45eacb'
(0010, 0030) Patient's Birth Date DA: '19900601'
(0012, 0062) Patient Identity Removed CS: 'Yes'
(0012, 0063) De-identification Method LO: 'my_deid_recipe.dicom.v1.0'
(0020, 0010) Study ID SH: 'ae4b477e5709d0c1f746e0adc9ab552fee100b91416f9f3a04037e999077e823'
Hi @vsoch, I have updated the PR
More updates :)
This is ready for merge! @fcossio I'm in the middle of a work day, but I'll merge and then prepare the release tonight, and will share with the community. Thank you for adding this!
Excited to hear and looking forward to it! And thanks for the detailed feedback and all the patience
Thank you again! https://twitter.com/vsoch/status/1563307727811596289
Description
Modified DicomParser to be able to load datasets that don't come from a file (that are loaded as json) by adding a
from_file
argument to the constructor. The default value forfrom_file
isTrue
so that the interface remains unaffected.Added an example of how to create a custom class with a custom recipe and custom functions.
Related issues: #210
Please include a summary of the change(s) and if relevant, any related issues above.
Checklist
Open questions
Questions that require more discussion or to be addressed in future development: