How to handle datasets with invalid info[meas_id][secs]?

hoechenberger commented 4 years ago

I'm woking with the ds000246 OpenNeuro dataset:

$ aws s3 sync --no-sign-request s3://openneuro.org/ds000246 ds000246
$ cd ds000246/sub-emptyroom/meg

Reading the data works as expected:

import mne
raw = mne.io.read_raw_ctf('sub-emptyroom_task-noise_run-01_meg.ds')

Writing thows an exception:

raw.save('/tmp/foo.fif')

Traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-eb369e79ee42> in <module>
----> 1 raw.save('/tmp/foo.fif')

<decorator-gen-155> in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)

~/Development/mne-python/mne/io/base.py in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)
   1379                 "split_naming must be either 'neuromag' or 'bids' instead "
   1380                 "of '{}'.".format(split_naming))
-> 1381         _write_raw(fname, self, info, picks, fmt, data_type, reset_range,
   1382                    start, stop, buffer_size, projector, drop_small_buffer,
   1383                    split_size, split_naming, part_idx, None, overwrite)

~/Development/mne-python/mne/io/base.py in _write_raw(fname, raw, info, picks, fmt, data_type, reset_range, start, stop, buffer_size, projector, drop_small_buffer, split_size, split_naming, part_idx, prev_fname, overwrite)
   1844 
   1845     picks = _picks_to_idx(info, picks, 'all', ())
-> 1846     fid, cals = _start_writing_raw(use_fname, info, picks, data_type,
   1847                                    reset_range, raw.annotations)
   1848 

~/Development/mne-python/mne/io/base.py in _start_writing_raw(name, info, sel, data_type, reset_range, annotations)
   2018         cals.append(info['chs'][k]['cal'] * info['chs'][k]['range'])
   2019 
-> 2020     write_meas_info(fid, info, data_type=data_type, reset_range=reset_range)
   2021 
   2022     #

~/Development/mne-python/mne/io/meas_info.py in write_meas_info(fid, info, data_type, reset_range)
   1453     """
   1454     info._check_consistency()
-> 1455     _check_dates(info)
   1456 
   1457     # Measurement info

~/Development/mne-python/mne/io/meas_info.py in _check_dates(info, prepend_error)
   1411                 if (value[key_2] < np.iinfo('>i4').min or
   1412                         value[key_2] > np.iinfo('>i4').max):
-> 1413                     raise RuntimeError('%sinfo[%s][%s] must be between '
   1414                                        '"%r" and "%r", got "%r"'
   1415                                        % (prepend_error, key, key_2,

RuntimeError: info[meas_id][secs] must be between "-2147483648" and "2147483647", got "-5364633480"

How to best deal with data like this? Can I simply set info[meas_id][secs] to an arbitrary (valid) value? Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…

larsoner commented 4 years ago

Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…

The FIF format in particular has a limit on how large a span of dates it can write because it writes out seconds in int32. Other formats that use other methods (e.g., storing seconds in int64, or dates in a suitable string format) will not suffer from this problem.

As to how to fix it, you can set it to zero and things will work (unless you have saved separate annotations you want to add), but be careful if you ever want to do something having to do with dates across multiple subjects or runs. Typically during anonymization you shift all subjects and runs by some fixed amount so that their relative timings stay fixed. Wiping out the meas_date will make this no longer be the case.

agramfort commented 4 years ago

for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

bloyl commented 4 years ago

I would check what the date is. 5364633480 is about 170 years so my guess is that this data has been anonymized using some method that makes that value not meaningful.

If you want to be extra cautious, preserving as much information as you can in case it is relevant, you could use raw.anonymize() - which should time shift everything so that meas_date in range while preserving the timedelta between meas_date the other dates in the file.

https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.anonymize

On Thu, May 21, 2020 at 5:13 PM Alexandre Gramfort notifications@github.com wrote:

for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/7803#issuecomment-632349173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKTXHMQZLBLY2S4XWDPKBLRSWKQJANCNFSM4NGJOCFQ .

hoechenberger commented 4 years ago

It does pass validation with the BIDS validator though. We should probably file a bug report.

-- Sent from my phone, please excuse brevity and erroneous auto-correct.

On 21. May 2020, at 23:13, Alexandre Gramfort notifications@github.com wrote:

for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

agramfort commented 4 years ago

bids validator cannot read meg files just the file names so he cannot detect these issues.

hoechenberger commented 4 years ago

bids validator cannot read meg files just the file names so he cannot detect these issues.

Wait, so you're saying there's BIDS-relevant metadata stored in a file format that the BIDS validator cannot read? Shouldn't this be stored in a sidecar file, like the events??

hoechenberger commented 4 years ago

Thanks @larsoner for the explanation, and thanks @bloyl for the suggestion to try and re-anonymize, I will look into this and see how it goes!

bloyl commented 4 years ago

This raises an interesting question.

What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?

hoechenberger commented 4 years ago

What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?

I believe the sidecar-based values always take precedence.

agramfort commented 4 years ago

I believe the sidecar-based values always take precedence.

+1

davidcian commented 6 months ago

Same issue here with the Temple University TUAR dataset. Ended up just dropping the meas_date.

mne-tools / mne-python

How to handle datasets with invalid info[meas_id][secs]? #7803