nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
236 stars 125 forks source link

add anonymization #35

Open satra opened 7 years ago

satra commented 7 years ago

please add other types of anonymization that heudiconv should consider

blakedewey commented 7 years ago

Recently, I helped to clean up https://github.com/chop-dbhi/dicom-anon which works well for DICOM metadata cleaning. It is also auditable.

For face masking, I use mri_deface from FreeSurfer, which is now available without a FreeSurfer license.

satra commented 6 years ago

@yarikoptic and @mgxd - we should discuss this at code rodeo as well.

satra commented 6 years ago

@yarikoptic - we did not discuss this at code rodeo, but i think we could add a few things quite easily.

also @vsoch has been working on this: https://github.com/pydicom/deid and then there was the really older effort here: https://github.com/ssikka/DICOM-CTP-Anonymizer

it would be good to know how @vsoch implemented the custom functions of the CTP anonymizer in deid, i.e, the deid correspondence of the xml file.

vsoch commented 6 years ago

deid is good with handling the metadata, but doesn't do anything for image pixels (other than a similar thing to deid to flagging images based on their header fields). @satra which functions are you interested in? The basic "flow" of deid is to take a deidentification recipe (saying what fields to blank/add/remove) and then doing that. :) If you guys have need or interest I would love to help out and tweak the software with new features that might be needed. see https://pydicom.github.io/deid/

satra commented 6 years ago

@vsoch - here are a few examples:

https://github.com/ssikka/DICOM-CTP-Anonymizer/blob/master/CTPAnonimizationProfile.xml#L17 https://github.com/ssikka/DICOM-CTP-Anonymizer/blob/master/CTPAnonimizationProfile.xml#L18 https://github.com/ssikka/DICOM-CTP-Anonymizer/blob/master/CTPAnonimizationProfile.xml#L28

basically if you just browse the xml file you will see the different kinds of anonymization being done.

i think there are a few things i would like to see in heudiconv. when the dicom metadata are extracted, they should go through the anonymization process before being added to nifti header and/or json file. this will require some re-orchestration with nibabel (i.e. dcmstack functionality should go to nibabel). and deid can be added on as an optional requirement via a function handle to the nicom reader.

but first, i would love to get @vsoch's opinion on the feasibility of the CTP type functions in deid.

satra commented 6 years ago

@vsoch - in case if you don't scroll all the way down:

this line and the following works on groups.

https://github.com/ssikka/DICOM-CTP-Anonymizer/blob/master/CTPAnonimizationProfile.xml#L1038

vsoch commented 6 years ago

deid already does this, and it's also based on one or more custom recipes:

https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom#L364

The bottom is the header actions list (linked above) and the top is for flagging burnt in pixels (PHI in images).

A group should just be another dicom header field, and so if you specify it as KEEP it would work the same.

satra commented 6 years ago

so how would i do this with deid:

<e en="T" t="00080018" n="SOPInstanceUID">@hashuid(@UIDROOT,this)</e>

this basically replaces the information in the tag with a function like this:

def hashuid(uuidroot, orig_tag_val):
    new_val = ...
    return new_val
vsoch commented 6 years ago

You would extract the identifiers, it gives you a dictionary with key/value for header fields, maybe like:

...

{'SOPInstanceUID': 'value' }
...

then values, then you can change those as you please. In this case you could just create a variable in the lookup for the new replacement, like:

fields['new_value'] = hashuid(uuidroot, fields['myvalue'] }

Then you would run the function to "replace" with the recipe, and this would be in the recipe

REPLACE SOPInstanceUID var:new_value

but if the function is really just returning a consistent string, you can just set that (no var:)

REPLACE SOPInstanceUID ...

The valid actions are here --> https://github.com/pydicom/deid/blob/master/deid/config/standards.py#L26 and actually it would be very easy to have an equivalent func:name to hand deid a lookup with a function to pass some value through (which I think the above is doing?) vs. extraction --> run functions --> replacement. The implementation is done as such because usually after extraction there is some custom saving / lookup by the institution for the actual identifiers, and then they are cleaned.

vsoch commented 6 years ago

and if you have need it would be fantastic to write a nifti (or other image type) plugin! The functions to work with the identifiers are the same, it would just bring in nibabl used in a nifti module instead of pydicom used in a dicom module. Also be aware that nowish / this week the new pydicom is going out, and checks, etc would need to be done to update for that version. If you have a good use I would like to work with you on this, and I would want to grow the tool as a solution for this task using python + dicom.

satra commented 6 years ago

thanks @vsoch - this is super helpful. there are a couple of pieces in play before everything goes through nibabel, but hopefully soon.

vsoch commented 6 years ago

okay great! Poke me when needed, glad to help.

yarikoptic commented 6 years ago

sorry, I am late to the party. Here is where I have been collecting a list of tools as well -- http://open-brain-consent.readthedocs.io/en/master/anon_tools.html and note that deid is apparently too popular, I will add now this other deid as well. My "personal" interests would be first in data anonimization (defacing etc) since we are trying to avoid any otherwise personally identifiable (besides demographics, and accession dates/times) information into DICOMs. So for now we just mark all dicoms and sensitive and keep them only as the archive of the "pristine sources" which we aren't going to redistribute. Any mangling of them would invalidate their status of being "pristine sources" ;)

satra commented 6 years ago

@yarikoptic - here are a few scenarios for metadata anonymization (the original post lists face masking, ear masking) - which are in general a little more difficult (potentially computationally intensive/erroneous depending on quality of data).

dicoms - > heudiconv only converts the necessary pieces at present and places in sourcedata only the converted dicoms. i think we need to add the non-converted dicoms as well.

dicom metadata embedded in json/nifti (without minmeta) - we would want to sanitize these, otherwise everything should in principle be marked sensitive unless over-ridden by user. at mit, our dicoms are sanitized at point of collection, except date of scan, scanner model/location.

yarikoptic commented 6 years ago

"non-converted" dicoms -- indeed it is up to the heuristic to decide/specify which dicoms to keep and which to convert to nifti... e.g. in reproin one, since few weeks back, we also retain scout dicoms without converting them to niftis. If I felt that e.g. some derived dicoms (stats maps etc) would be valuable, I would also store them into sourcedata. I do not think that heudiconv should by default retain all dicoms

yeah, I see the point of providing some helpers/infrastructure for sanitizing extracted dicom fields.

satra commented 6 years ago

i think we should do all dicoms, so that we do not have to keep another copy of dicoms elsewhere. and these dicoms should not be santized - they should just be the raw dicoms. if we wanted to publish, perhaps sanitization can take place then.

originally in heudiconv we used hard links (instead of compressed targz) to store the dicoms so that if the directory was copied the dicoms would be copied along. and in this scenario we considered the dicoms to be part of the derived dataset

it may be good to come up with a set of routes - it would be a good flowchart for people as well.

how about something like:

dataset

vsoch commented 6 years ago

Is this using BIDS or something else? If there is a pipeline / organizational protocol adopted for how to store and do this procedure, shouldn't we do it in a way that fits in with something like BIDS (as a growing standard?) I don't know the details of this work, but I've seen many cases of everyone rolling their own... and it's :scream:

satra commented 6 years ago

@vsoch - heudiconv supports bids but not exclusively, but that is one of the use cases and the structure described above is to align with bids. it's a more general framework. also with the nidm work in mind - the conversion to bids may happen in the future on demand rather than immediately.

it's just that bids only cares about the converted data not all the dicoms that were generated in a session. the intent here is to leverage datalad to support all dicoms + dicoms converted + the nifti files generated. and since dicoms and dicoms converted should be identical, one would not use up twice the space.