nipy / heudiconv

Flexible DICOM conversion into structured directory layouts
https://heudiconv.readthedocs.io
Other
235 stars 125 forks source link

acquisition times in sub-{subject}_scans.tsv #92

Open poldrack opened 7 years ago

poldrack commented 7 years ago

the sub-{subject}_scans.tsv includes full acquisition times including date, which are considered PHI. can these be removed?

yarikoptic commented 7 years ago

IIRC those are the part of the BIDS specification and are desired for e.g. upload to NDA. That is why e.g. in datalad datasets they are marked with 'distribution-restricted=sensitive' tag so we could avoid distributing them publicly. I guess if someone wants to annonimize them to their liking (e.g. offset by some date, or round to the nearest year) -- they could do it post-hoc

poldrack commented 7 years ago

From the BIDS spec (1.0.2):

If acquisition time is included it should be under “acq_time” header. Datetime should be expressed in the following format 2009-06-15T13:45:30 (year, month, day, hour (24h), minute, second; this is equivalent to the RFC3339 “date-time” format, time zone is always assumed as local time). For anonymization purposes all dates within one subject should be shifted by a randomly chosen (but common across all runs etc.) number of days. This way relative timing would be preserved, but chances of identifying a person based on the date and time of their scan would be decreased. Dates that are shifted for anonymization purposes should be set to a year 1900 or earlier to clearly distinguish them from unmodified data. Shifting dates is recommended, but not required. Additional fields can include external behavioural measures relevant to the scan. For example vigilance questionnaire score administered after a resting state scan.

This suggests to me that acquisition time is not actually required, and that if it is included in a BIDS dataset that it should be done after randomly shifting the date (not required, but recommended).

chrisgorgo commented 7 years ago

+1 for a default that is closer to HIPAA and a command line switch that turns back storing the dates.

On Sep 25, 2017 1:39 AM, "Russ Poldrack" notifications@github.com wrote:

From the BIDS spec (1.0.2):

If acquisition time is included it should be under “acq_time” header. Datetime should be expressed in the following format 2009-06-15T13:45:30 (year, month, day, hour (24h), minute, second; this is equivalent to the RFC3339 “date-time” format, time zone is always assumed as local time). For anonymization purposes all dates within one subject should be shifted by a randomly chosen (but common across all runs etc.) number of days. This way relative timing would be preserved, but chances of identifying a person based on the date and time of their scan would be decreased. Dates that are shifted for anonymization purposes should be set to a year 1900 or earlier to clearly distinguish them from unmodified data. Shifting dates is recommended, but not required. Additional fields can include external behavioural measures relevant to the scan. For example vigilance questionnaire score administered after a resting state scan.

This suggests to me that acquisition time is not actually required, and that if it is included in a BIDS dataset that it should only be done after randomly shifting the date.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nipy/heudiconv/issues/92#issuecomment-331751315, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp8neFjwbNVmeYsjHlO2ZH5eqF5N3ks5slvYzgaJpZM4PiD5u .

yarikoptic commented 7 years ago

and then also by default remapping all subject ids to sub-001, 002 etc because who knows what possibly leaking sensitive information was encoded in the subject id? I thought that there was already a consensus that additional annonimization should be possibly done (if/when desired) as a postprocessing (I wouldn't mind an option for additional scrutiny though, but again -- when to "stop"?)

chrisgorgo commented 7 years ago

Oh common - things don't have to be all or nothing. The same way adding some test to a project is a good idea even if you don't have 100% code coverage, increasing the HIPAA compliance is beneficial even if you cannot fully guarantee it.

I think we agree that this should be an option, all I'm saying is that the default should be turning on the removal of those protected fields that are easy to remove.

Just a little feedback!

poldrack commented 7 years ago

+1 to doing as well as we can even if it's not perfect

On Sun, Sep 24, 2017 at 8:25 PM, Chris Filo Gorgolewski < notifications@github.com> wrote:

Oh common - things don't have to be all or nothing. The same way adding some test to a project is a good idea even if you don't have 100% code coverage, increasing the HIPAA compliance is beneficial even if you cannot fully guarantee it.

I think we agree that this should be an option, all I'm saying is that the default should be turning on the removal of those protected fields that are easy to remove.

Just a little feedback!

On Sep 25, 2017 4:03 AM, "Yaroslav Halchenko" notifications@github.com wrote:

and then also by default remapping all subject ids to sub-001, 002 etc because who knows what possibly leaking sensitive information was encoded in the subject id? I thought that there was already a consensus that additional annonimization should be possibly done (if/when desired) as a postprocessing (I wouldn't mind an option for additional scrutiny though, but again -- when to "stop"?)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/nipy/heudiconv/issues/92#issuecomment-331765850, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp5f2pU1JRDGIs_ lMA8ULpCmdDx30ks5slxfzgaJpZM4PiD5u .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nipy/heudiconv/issues/92#issuecomment-331768232, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkGB5abmyPWtZnbDr9bYNPO9V96p9ks5slx0ngaJpZM4PiD5u .

-- Russell A. Poldrack Albert Ray Lang Professor of Psychology Professor (by courtesy) of Computer Science Bldg. 420, Jordan Hall Stanford University Stanford, CA 94305

poldrack@stanford.edu http://www.poldracklab.org/

satra commented 7 years ago

from a heudiconv perspective one should be able to convert to whatever form with any level of lossyness. these are things that have not made it's way in yet:

i think we should have anonymization in heudiconv, but like heuristics this should be left to the user. we can include a default ctp anonymizer which folks can run if they want to, or replace with their own. we did start this project a long time back: https://github.com/ssikka/DICOM-CTP-Anonymizer perhaps its time to revisit and finish that up.

@yarikoptic - we do have a subject id anonymizer - in fact that's the only thing currently in there and nobody knows how to use it.

yarikoptic commented 7 years ago

rright -- forgot about subject id annonimizer. And I agree that all those should be deferred to the heuristic probably. May be even multiple heuristics could be specified (e.g. one to layout files, another one providing callbacks for annonimization which would be consulted for storing dates etc). As for "options" -- we already have some which I think should be absorbed into "heuristics" since that is what they are (e.g. --bids)

nicholst commented 5 years ago

To offer a "+1" to this, it would be helpful to have options to anonymise or at least suppress the date from the entire BIDS, including: