mne-tools / mne-bids

MNE-BIDS is a Python package that allows you to read and write BIDS-compatible datasets with the help of MNE-Python.
https://mne.tools/mne-bids/
BSD 3-Clause "New" or "Revised" License
133 stars 88 forks source link

write_raw_bids for CTF does not anonymize certain files? #358

Closed hstojic closed 2 years ago

hstojic commented 4 years ago

Describe the bug

Certain files still contain original filenames (ClassFile, MarkerFile and *.hist file) after writing, as well as raw.info['subject_info'], perhaps there are mentions in some other binary files. The problem is of course with personally identifiable information - the original filenames contain scanner IDs for subjects etc.

Not sure though if this should be responsibility of the library, but probably every effort should be spend that original filenames are scrubbed from files if found?

Steps to reproduce

# load the raw data and edit some info
raw = mne.io.read_raw_ctf(raw_file_name, clean_names = True, preload = False)
raw.info['experimenter'] = pars_gen['info']['experimenter']
raw.info['subject_info']['age'] = pars_ses[session_id]['age']
raw.info['subject_info']['sex'] = pars_ses[session_id]['sex']
raw.info['subject_info']['his_id'] = subject_id

# generate a valid BIDS file name
fname = mne_bids.make_bids_basename(
    subject = '%s' % subject_id, 
    task = task_name, 
    run = (run_id + 1)
)

# write
mne_bids.write_raw_bids(
    raw = raw, 
    bids_basename = fname, 
    output_path = dir_megdata_bids, 
    events_data = event_id_array,
    event_id = pars_gen['events']['event_ids'],
    overwrite = True, 
    verbose = True
)

Expected results

ClassFile, MarkerFile and *.hist files should contain no mention of original filenames. raw.info['subject_info']['his_id'] should not contain the original dataset ID

Actual results

But they do.

Additional information

Platform: Linux-4.11.0-14-generic-x86_64-with-Ubuntu-16.04-xenial Python: 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609] Executable: /home/hstojic/.pyenv/meg3/bin/python3 CPU: x86_64: 16 cores Memory: Unavailable (requires "psutil" package) mne: 0.18.2 numpy: 1.16.4 {blas= language = c, lapack= language = c} scipy: 1.3.0 matplotlib: 3.0.3 {backend=TkAgg}

sklearn: 0.21.2 nibabel: 3.0.1 mayavi: Not found cupy: Not found pandas: 0.24.2 dipy: Not found

jasmainak commented 4 years ago

I don't see anonymize=True in your script. We write to fif files when we anonymize because we don't have writer for CTF.

hstojic commented 4 years ago

Where should anonymize=True go exactly? I know I can call it on raw object, do you mean that one, raw.anonymize()?

jasmainak commented 4 years ago

Oops looks like we’re due for a release. Can you try the development version? There is an argument called anonymize in write_raw_bids

On Thu 5 Mar 2020 at 12:03, Hrvoje Stojic notifications@github.com wrote:

Where should anonymize=True go exactly? I know I can call it on raw object, do you mean that one, raw.anonymize()?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-bids/issues/358?email_source=notifications&email_token=ADY6FIUVCZFLEOZ4XW2ZD6TRF7LNHA5CNFSM4LCMU632YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN6CH5Q#issuecomment-595338230, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY6FIRWYBASBUGKSQSCU5TRF7LNHANCNFSM4LCMU63Q .

-- Sent from my iPhone

hstojic commented 4 years ago

Just to make sure, with that option it will write the BIDS data in FIF instead of CTF format?

I would be fine with transforming data to FIF format if nothing is lost with this transformation?

jasmainak commented 4 years ago

Give it a try I would say. If you're able to run your scripts as before, it should be okay.

I guess we could add a test to check that the raw object is the same when you read from the transformed file @agramfort ?

agramfort commented 4 years ago

I guess we could add a test to check that the raw object is the same when you read from the transformed file @agramfort https://github.com/agramfort ?

yes we could. I would see to see a diff in a PR to fully judge what it implies

hstojic commented 4 years ago

Give it a try I would say. If you're able to run your scripts as before, it should be okay.

I haven't started analysing the data seriously, so it would be a bit difficult to check if my pipelines work with FIF :)

sappelhoff commented 4 years ago

Just to make sure, with that option it will write the BIDS data in FIF instead of CTF format?

No, I think you'll still output CTF. See https://github.com/mne-tools/mne-bids/blob/a6017fe334d76878ba4365f0d97d872403a02a16/mne_bids/write.py#L1206-L1228

Oops looks like we’re due for a release.

I agree :-)

I guess we could add a test to check that the raw object is the same when you read from the transformed file

@jasmainak you mean after each conversion of data-A -> data-B you do: assert read(data-A) == read(data-B)?

jasmainak commented 4 years ago

No, I think you'll still output CTF.

I would consider this a bug if that happens

you mean after each conversion of data-A -> data-B

yep roughly that.

sappelhoff commented 4 years ago

@hstojic do you have news on your issue? Were you able to resolve the problem?