Open hoechenberger opened 4 years ago
Why is info['file_id']
equal to 1970-01-01 01:00:00
and why does this have to be anonymized? It seems like this date has already been set to a date that definitely does not correspond to the file creation date (assuming that the file_id
key corresponds to file creation).
I agree is should not happen that info['file_id'] is not coherent with meas_date
for me it's a bug due to old mne version that was not setting the date correctly everywhere
Ok, so I guess what we should do in this case is
Right? I can make a PR if we all agree.
you should fix the file with set_meas_date before anonymization
But there are files like this out there in the wild.... should we not take care of them automatically in case of (re)anonymozation? Or is that what you meant: handle this inside Raw.anonymize()?
ok you can try to warn if the dates are inconsistent and only use meas_date as true value
The file_id
key seems to be specific to FIF, so I agree that if the sec
subfield does not match meas_date
, warn, anonymize meas_date
, and set file_id
equal to the anonymized meas_date
.
While we're at it, why does anonymize
default to setting the date to Jan 01 2000? Wouldn't it be more convenient to follow the BIDS recommendation to set the date to e.g. Jan 01 1920 instead?
The file_id key seems to be specific to FIF, so I agree that if the sec subfield does not match meas_date, warn, anonymize meas_date, and set file_id equal to the anonymized meas_date.
yes I don't see how they can become different
While we're at it, why does anonymize default to setting the date to Jan 01 2000? Wouldn't it be more convenient to follow the BIDS recommendation to set the date to e.g. Jan 01 1920 instead?
I think it's better to enforce users to shift the dates to keep consistent ordering of subjects by dates
I agree with @agramfort , there probably should not be a default date
I think it's better to enforce users to shift the dates to keep consistent ordering of subjects by dates
Yes, but there is a default, which happens to be a bad one because it's not BIDS-compliant. Are you suggesting we should remove the default (in another PR)?
@cbrnr Yes I suggest we should consider removing the default
OK. Some arguments for/against removing the default:
Pros:
Cons:
I don't really see the point of removing the default, I think it's more convenient to let MNE handle the anonymization - but I'd set a different default to <1925.
mne-bids will complain. mne-python is not tied to bids. You will write bids files with mne-bids so I don't see the urge to change anything here. Now we need to make sure mne-bids warns properly.
The relative time deltas between recordings might not be relevant at all - I can't think of a reason where this might be important (as long as subject age can be inferred, which is still the case)
An example would be a training study where the number of days between different recordings of the same subject was an important covariate.
Yup. I always report inter-session intervals for multi-session studies. Would be nice to have this information contained in the info.
-- Sent from my phone, please excuse brevity and erroneous auto-correct.
On 18. Aug 2020, at 17:14, Daniel McCloy notifications@github.com wrote:
- The relative time deltas between recordings might not be relevant at all - I can't think of a reason where this might be important (as long as subject age can be inferred, which is still the case) An example would be a training study where the number of days between different recordings of the same subject was an important covariate.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
True, I was only thinking of between subjects. I think the argument should support a date in addition to days.
True, I was only thinking of between subjects. I think the argument should support a date in addition to days.
I was thinking the same, but then – again! – it wouldn't be easy to preserve relative date differences!
Considering this, having a default date as we have now is not very useful in this regard either. However, changing it to a fixed number of days could lead to an easy de-anonymization if users forget about manually passing daysback
.
Hence I conclude: we actually cannot have any reasonable default.
But really this is beyond the scope of this particular issue :)
Here I will only focus on this:
@agramfort
ok you can try to warn if the dates are inconsistent and only use meas_date as true value
will make a PR shortly
Looking at this again with @agramfort, there seems to be another problem involved: Raw.set_meas_date()
doesn't touch the dates in info['file_id']
-- maybe that's on purpose? MWE:
# %%
import mne
import os.path as op
from datetime import datetime
data_path = mne.datasets.somato.data_path()
raw_fname = op.join(data_path, 'sub-01', 'meg', 'sub-01_task-somato_meg.fif')
raw = mne.io.read_raw_fif(raw_fname, verbose='error')
print(f"Date in info['meas_date']: {raw.info['meas_date']}")
print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
print('\nAfter calling set_meas_date():')
raw.set_meas_date(raw.info['meas_date'])
print(f"Date in info['meas_date']: {raw.info['meas_date']}")
print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
raw.anonymize(daysback=30000)
Output:
Date in info['meas_date']: 2007-07-05 11:17:11.172243+00:00
Date in info['file_id']: 1970-01-01 00:00:00
After calling set_meas_date():
Date in info['meas_date']: 2007-07-05 11:17:11.172243+00:00
Date in info['file_id']: 1970-01-01 00:00:00
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Untitled-2 in
17 print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
18
---> 19 raw.anonymize(daysback=30000)
in anonymize(self, daysback, keep_his, verbose)
~/Development/mne-python/mne/channels/channels.py in anonymize(self, daysback, keep_his, verbose)
593 .. versionadded:: 0.13.0
594 """
--> 595 anonymize_info(self.info, daysback=daysback, keep_his=keep_his,
596 verbose=verbose)
597 self.set_meas_date(self.info['meas_date']) # unify annot update
in anonymize_info(info, daysback, keep_his, verbose)
~/Development/mne-python/mne/io/meas_info.py in anonymize_info(info, daysback, keep_his, verbose)
2245 'daysback parameter was too large.'
2246 'Underlying Error:\n')
-> 2247 _check_dates(info, prepend_error=err_mesg)
2248
2249 return info
~/Development/mne-python/mne/io/meas_info.py in _check_dates(info, prepend_error)
1454 if (value[key_2] < np.iinfo('>i4').min or
1455 value[key_2] > np.iinfo('>i4').max):
-> 1456 raise RuntimeError('%sinfo[%s][%s] must be between '
1457 '"%r" and "%r", got "%r"'
1458 % (prepend_error, key, key_2,
RuntimeError: anonymize_info generated an inconsistent info object. daysback parameter was too large.Underlying Error:
info[file_id][secs] must be between "-2147483648" and "2147483647", got "-2591997853"
I have no clue what info['file_id']
contains and the docs don't really explain it either.
here is a fix:
import mne
import os.path as op
from datetime import datetime, timezone
data_path = mne.datasets.somato.data_path()
raw_fname = op.join(data_path, 'sub-01', 'meg', 'sub-01_task-somato_meg.fif')
raw = mne.io.read_raw_fif(raw_fname, verbose='error')
print(f"Date in info['meas_date']: {raw.info['meas_date']}")
print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
print('\nAfter calling set_meas_date():')
meas_date = datetime.utcfromtimestamp(
raw.info['file_id']['secs'] + 1e-6 * raw.info['file_id']['usecs'])
meas_date = meas_date.replace(tzinfo=timezone.utc)
raw.set_meas_date(meas_date)
print(f"Date in info['meas_date']: {raw.info['meas_date']}")
print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
raw.anonymize(daysback=20000)
print(f"Date in info['meas_date']: {raw.info['meas_date']}")
print(f"Date in info['file_id']: {datetime.utcfromtimestamp(raw.info['file_id']['secs'])}")
don't know what happened to this file. But what bothers now with current code is that when you anonymize the default for file_id is 1970 and for meas_date it's 2000 so you quickly end up with data where file_id and meas_date are not consistent. For me this should not happened. This deserves a first fix and then I think we should make sure that when we set_meas_date the file_id should then be adjusted.
thoughts? @alexrockhill @larsoner ?
It does seem incorrect that these are not consistent. Worth trying to fix it to see if it breaks some existing test -- this might tell us if we had some reason to do it this way
looking at it now
Not sure if this is exactly the same problem, but it seems related: in the following MWE, info['file_id']['secs']
doesn't survive an I/O roundtrip.
import mne
from mne.datasets import sample
data_path = sample.data_path()
subjects_dir = data_path + '/subjects'
fname_in = data_path + '/MEG/sample/sample_audvis_raw.fif'
fname_out = '/tmp/raw.fif'
# Create a copy.
raw = mne.io.read_raw_fif(fname_in, verbose=False)
raw.save(fname_out, overwrite=True)
# Read copy.
raw = mne.io.read_raw_fif(fname_out, preload=True, verbose=False)
print(f"Before anonymization: {raw.info['file_id']['secs']}")
# Anonymize.
raw.anonymize(daysback=-20)
print(f"After anonymization: {raw.info['file_id']['secs']}")
# Test I/O roundtrip.
raw.save(fname_out, overwrite=True)
raw = mne.io.read_raw_fif(fname_out, verbose=False)
print(f"After anonymization & I/O roundtrip: "
f"{raw.info['file_id']['secs']}")
Output:
Before anonymization: 0
After anonymization: 1730147
After anonymization & I/O roundtrip: 0
I think @bloyl worked on this part if I'm not mistaken. I can take a look but I was only using the functionality as an end-user with mne-bids
I ran into this when trying to ensure that all date-related information in the FIFF files is properly anonymized when converting to BIDS and before sharing the datasets. So I'm in a similar situation: only ever got in touch with this stuff because I was using MNE-BIDS. :)
I think meas_id is meant to be saved but the file_id is supposed to be unique to a file. That's what I remember from conversations. Maybe @larsoner knows more.
@agramfort
I think meas_id is meant to be saved but the file_id is supposed to be unique to a file
@larsoner earlier said,
It does seem incorrect that these are not consistent.
So I believe they should always stay in sync
So I believe they should always stay in sync
I mean I wouldn't care, really, if I weren't concerned we might be accidentally leaking (non-anonymized) measurement info via file_id
. Is this concern justified?
Thinking about it again, it's likely that file_id is supposed to be unique for the given file / generated at write time. Maybe we should just comment about this in the code?
Thinking about it again, it's likely that file_id is supposed to be unique for the given file / generated at write time. Maybe we should just comment about this in the code?
sounds like a good idea to me
git grep -n FIFF_FILE_ID
suggests that file id is set here during write:
https://github.com/mne-tools/mne-python/blob/92254198f02e19ca9227888fd2b1d5f8a90c8084/mne/io/write.py#L292-L304
however we don't seem to call start_file
with the _id
argument. So the default (None
) is use. which leads to DATE_NONE
being used for mne-python file_ids.
see: https://github.com/mne-tools/mne-python/blob/92254198f02e19ca9227888fd2b1d5f8a90c8084/mne/io/write.py#L265-L267 https://github.com/mne-tools/mne-python/blob/92254198f02e19ca9227888fd2b1d5f8a90c8084/mne/io/write.py#L458-L464
Fundamentally, the problem is the lack of a published FIFF standard (at least that I know of). The docstring below seems to be the best we have: https://github.com/mne-tools/mne-python/blob/92254198f02e19ca9227888fd2b1d5f8a90c8084/mne/io/meas_info.py#L195
one could look at the mne-c code how it does it
Working with the
somato
dataset, I ran into an issue related to the fact that the measurement date I'm intending to anonymize deviates from the date ininfo['file_id']['secs']
such that, when I reach the desired adjustment for the measurement date,info['file_id']['secs']
goes out-of-range (even though the measurement date would be within the valid range).MWE:
Output:
How shall I go about anonymizing this dataset? I want to use
daysback
large enough to move the measurement date before 1925, as is required by BIDS.cc @agramfort