Sample NIRS data not read, problem with channel naming

loew commented 1 year ago

Describe the bug

Sample NIRS data in the BIDS format can not be read due to some problems with channel names. This applies to the data set in the example below, but can be replicated with the data "Auditory Speech and Noise" dataset. Upgrading mne, mne_bids and mne_nirs from the latest official versions to the dev versions did not resolve the issue.

If the channels.tsv file in the subjects directory is renamed (channels.tsvxxx), the data can be read.

Steps to reproduce

from mne_bids import BIDSPath, read_raw_bids
from mne_nirs.datasets import fnirs_motor_group

bpath = BIDSPath(subject='01', task="tapping", root=fnirs_motor_group.data_path(), \
                     datatype="nirs", suffix="nirs", extension=".snirf")
raw = read_raw_bids(bids_path=bpath, verbose=False)

Expected results

I expect the data to be read.

Actual results

ValueError                                Traceback (most recent call last)
Cell In[8], line 6
      2 from mne_nirs.datasets import fnirs_motor_group
      4 bpath = BIDSPath(subject='01', task="tapping", root=fnirs_motor_group.data_path(), \
      5                      datatype="nirs", suffix="nirs", extension=".snirf")
----> 6 raw = read_raw_bids(bids_path=bpath, verbose=False)

File <decorator-gen-616>:10, in read_raw_bids(bids_path, extra_params, verbose)

File ~\anaconda3\envs\mne_13\lib\site-packages\mne_bids\read.py:780, in read_raw_bids(bids_path, extra_params, verbose)
    775 channels_fname = _find_matching_sidecar(bids_path,
    776                                         suffix='channels',
    777                                         extension='.tsv',
    778                                         on_error='warn')
    779 if channels_fname is not None:
--> 780     raw = _handle_channels_reading(channels_fname, raw)
    782 # Try to find an associated electrodes.tsv and coordsystem.json
    783 # to get information about the status and type of present channels
    784 on_error = 'warn' if suffix == 'ieeg' else 'ignore'

File ~\anaconda3\envs\mne_13\lib\site-packages\mne_bids\read.py:606, in _handle_channels_reading(channels_fname, raw)
    603     for bids_ch_name, raw_ch_name in zip(ch_names_tsv,
    604                                          raw.ch_names.copy()):
    605         if bids_ch_name != raw_ch_name:
--> 606             raw.rename_channels({raw_ch_name: bids_ch_name})
    608 # Set the channel types in the raw data according to channels.tsv
    609 channel_type_bids_mne_map_available_channels = {
    610     ch_name: ch_type
    611     for ch_name, ch_type in channel_type_bids_mne_map.items()
    612     if ch_name in raw.ch_names
    613 }

File <decorator-gen-43>:12, in rename_channels(self, mapping, allow_duplicates, verbose)

File ~\anaconda3\envs\mne_13\lib\site-packages\mne\channels\channels.py:414, in SetChannelsMixin.rename_channels(self, mapping, allow_duplicates, verbose)
    411 from ..io import BaseRaw
    413 ch_names_orig = list(self.info['ch_names'])
--> 414 rename_channels(self.info, mapping, allow_duplicates)
    416 # Update self._orig_units for Raw
    417 if isinstance(self, BaseRaw):
    418     # whatever mapping was provided, now we can just use a dict

File <decorator-gen-51>:12, in rename_channels(info, mapping, allow_duplicates, verbose)

File ~\anaconda3\envs\mne_13\lib\site-packages\mne\channels\channels.py:1113, in rename_channels(info, mapping, allow_duplicates, verbose)
   1111 # check that all the channel names are unique
   1112 if len(ch_names) != len(np.unique(ch_names)) and not allow_duplicates:
-> 1113     raise ValueError('New channel names are not unique, renaming failed')
   1115 # do the remapping in info
   1116 info['bads'] = bads

ValueError: New channel names are not unique, renaming failed

Additional information

Platform:         Windows-10-10.0.19045-SP0
Python:           3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:16:33) [MSC v.1929 64 bit (AMD64)]
Executable:       C:\Users\loew_admin\anaconda3\envs\mne_13\python.exe
CPU:              Intel64 Family 6 Model 158 Stepping 9, GenuineIntel: 4 cores
Memory:           7.8 GB

mne:              1.4.dev0
numpy:            1.23.5 {MKL 2021.4-Product with 4 threads}
scipy:            1.10.0
matplotlib:       3.6.2 {backend=module://matplotlib_inline.backend_inline}

sklearn:          1.2.0
numba:            0.56.4
nibabel:          5.0.0
nilearn:          0.10.0
dipy:             1.5.0
openmeeg:         Not found
cupy:             Not found
pandas:           1.5.2
pyvista:          0.37.0 {OpenGL 4.5.0 - Build 31.0.101.2114 via Intel(R) HD Graphics 630}
pyvistaqt:        0.9.0
ipyvtklink:       0.2.3
vtk:              9.2.5
qtpy:             2.3.0 {PyQt5=5.15.6}
ipympl:           Not found
pyqtgraph:        0.13.1
pooch:            v1.6.0

mne_bids:         0.13.dev0
mne_nirs:         0.6.dev0
mne_features:     Not found
mne_qt_browser:   0.4.0
mne_connectivity: Not found
mne_icalabel:     Not found

loew commented 1 year ago

I might be wrong, but it seems that the channels in the raw file (.snirf) and the channels.tsv file are in different order. This causes _handle_channels_reading to detect mismatches and all names get messed up, ending in unequal numbers of unique channel names.

larsoner commented 1 year ago

To tell if this is a MNE-Python or MNE-BIDS problem (it probably is not a MNE-NIRS problem actually!), can you try just mne.io.read_raw_snirf on the file first? If that fails, it's a problem with MNE-Python. If it succeeds but read_raw_bids fails, it's (probably) a problem with MNE-BIDS. My guess is that it's an MNE-BIDS problem since it happens in rename_channels...

loew commented 1 year ago

mne.io.read_raw_snirf succeeds.

I believe the problem is that channel names are in different order in the snirf file and the channels.tsv file:

from mne_bids import BIDSPath, read_raw_bids
from mne_nirs.datasets import fnirs_motor_group
from mne_bids.tsv_handler import _from_tsv
from mne.io import read_raw_snirf

bpath = BIDSPath(subject='01', task="tapping", root=fnirs_motor_group.data_path(), \
                     datatype="nirs", suffix="nirs", extension=".snirf")

raw_snirf = read_raw_snirf(bpath, verbose=False)
channels_snirf = raw_snirf.ch_names

bpath.update(suffix='channels', extension='.tsv')
channels_tsv=_from_tsv(bpath)

print('snirf:', channels_snirf[:5])
print('tsv  :', channels_tsv['name'][:5])

snirf: ['S1_D1 760', 'S1_D2 760', 'S1_D3 760', 'S1_D9 760', 'S2_D1 760'] tsv : ['S1_D1 760', 'S1_D1 850', 'S1_D2 760', 'S1_D2 850', 'S1_D3 760']

welcome[bot] commented 1 year ago

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

larsoner commented 1 year ago

FYI @rob-luke @sappelhoff I've transferred this here because I think it's probably an issue with how MNE-BIDS handles channel names. My guess is that the dataset natively in SNIRF has some order that mne.io.read_raw_snirf reorders (to be in a standard paired-channel order), but then MNE-BIDS somehow does not handle the mismatch that is thereby created with the .tsv. But this is speculative on my part, TBD if it's actually the case...

sappelhoff commented 1 year ago

Thanks for triaging, @larsoner

I believe the problem is that channel names are in different order in the snirf file and the channels.tsv file:

yes, that's unfortunate ... for EEG, iEEG, and MEG it's a RECOMMENDATION that the order of channels in the raw file and in channels.tsv is identical, see:

yet for NIRS, I don't even see this recommendation. Was this deliberately omitted @rob-luke, or is that something we should add post-hoc?

https://bids-specification.readthedocs.io/en/latest/modality-specific-files/near-infrared-spectroscopy.html#channels-description-_channelstsv

Not that the recommendation would really cut any work for us, as it's not a REQUIREMENT and currently we seem to be handling this as if it were, leading to this issue :thinking:

larsoner commented 1 year ago

@sappelhoff do you expect it's "just" a matter of sorting the dataframe (or list or whatever) we get from reading the TSV by inst.ch_names.index(ch_name) for ch_name in tsv['ch_name'] or so?

sappelhoff commented 1 year ago

That's what I am thinking right now, but it's been a long time that I've looked into that specific code.

loew commented 1 year ago

I explored this issue a bit more using my own data. I went from the data in original NIRx format to MNE raw, to snirf and finally to BIDS format. It looks all fine with regard to the channel names.

I seems that the channels.tsv file provided with sample data is the problem. When I read MNE-NIRS sample data with read_raw_snirf and then use write_raw_bids to export the data, then the channels.tsv is different from the channels.tsv file that comes with the sample data. The newly created channels.tsv file corresponds exactly to channel names in the data imported using read_raw_snirf. Maybe @rob-luke could check how he created the BIDS files from his original nirx files?

Below the code to create a new channels.tsv from a snirf file.

from pathlib import Path
import mne_nirs
from mne.io import read_raw_snirf
from mne_bids import BIDSPath, read_raw_bids, write_raw_bids

datapath = mne_nirs.datasets.block_speech_noise.data_path()
bpath = BIDSPath(root=datapath, subject='01', session="01", suffix="nirs", extension=".snirf", task="AudioSpeechNoise", datatype="nirs")
raw_snirf = read_raw_snirf(bpath, verbose=False)

bpath_out = BIDSPath(root=Path.cwd() / 'testchannelstsv', subject='01', session="01", suffix="nirs", extension=".snirf", task="AudioSpeechNoise", datatype="nirs")
write_raw_bids(raw_snirf, bpath_out, overwrite=True)

rob-luke commented 1 year ago

Hi @loew, thanks for reporting this issue. I believe this is an unintended consequence of https://github.com/mne-tools/mne-python/pull/10642, when MNE changed the way it ordered channels.

Can you post a link to the dataset you are testing so I can verify if this is indeed the issue and how we can fix it. Thanks

loew commented 1 year ago

Hi @rob-luke, which dataset are you refering to? The ones that have the issue are part of the MNE-NIRS sample data. The test with my own data did not show the issue.

sappelhoff commented 1 year ago

@rob-luke I believe you can find the MWE in the original post, your help in resolving this would be appreciated!

hoechenberger commented 1 year ago

To recap, the problem is that you may have channels 'foo' and 'bar' in your raw data, but the channel order is 'bar', 'foo' in channels.tsv

When reading, MNE-BIDS tries to rename each channel to meet the channels.tsv definition, but renaming 'foo' to 'bar' will fail because 'bar' already exists at this stage.

Proposal for a fix:

Do two passes when populating the BIDS channel names

Assign the raw data channels the names from channels.tsv, but prepend a running index to avoid name collisions with existing channels in raw, e.g., '1 foo', '2 bar'
Remove the index, so we're only left with the actual channel names

larsoner commented 1 year ago

Assign the raw data channels the names from channels.tsv

By this do you mean some raw.rename_channels(...)? I don't think that's quite the issue here. IIUC tor these data the set(raw.ch_names) == set(channels_tsv['names']) or whatever, just the order differs. So renaming would do the wrong thing -- either the channels_tsv data needs to be reordered or the raw instance needs its channels reordered. (Or the people need to update their datasets to have consistent ordering.)

hoechenberger commented 1 year ago

@larsoner I edited my above comment to add more clarity as to how I understand the issue, could you please check if this description makes sense to you?

hoechenberger commented 1 year ago

I think we have an ambiguous situation here in case we have an order mismatch between raw and channels.tsv

In my mind we would keep the original order in raw and assign the names stored in channels.tsv

But another possible approach could be to re-order the channel data in raw such that it matches channels.tsv

larsoner commented 1 year ago

In my mind we would keep the original order in raw and assign the names stored in channels.tsv

I still don't quite follow what you mean by "assign" here. You could mean raw.rename_channels({raw_ch_name: csv_ch_name for raw_ch_name, csv_ch_name in zip(raw.ch_names, csv['ch_names']}) which will do the wrong thing. If you mean figuring out and index correspondence, then using that to rename the raw.ch_names to be csv['full_ch_names'][reorder_idx] or something this would make sense to me, but I'm not sure what the 'full_ch_names' or whatever would be (don't know that much about BIDS...).

hoechenberger commented 1 year ago

You could mean raw.rename_channels({raw_ch_name: csv_ch_name for raw_ch_name, csv_ch_name in zip(raw.ch_names, csv['ch_names']}) which will do the wrong thing.

Would it? I think it's one of two valid approaches. In fact, this was actually what I expected would happen when I first thought about this mismatch.

larsoner commented 1 year ago

Would it? I think it's one of two valid approaches.

Thinking about raw.ch_names = ['MEG0111', ..., 'EEG001', ...] but for whatever reason the person in their channels.tsv ordered EEG before MEG, the rename_channels approach would be wrong. I think that's the sort of situation we're in with this fNIRS data.

hoechenberger commented 1 year ago

Thinking about raw.ch_names = ['MEG0111', ..., 'EEG001', ...] but for whatever reason the person in their channels.tsv ordered EEG before MEG, the rename_channels approach would be wrong.

Yes, but think they have named the channels in raw

Fpz
TP9
TP10

and realized they swapped TP9 and TP10, they may want to fix the problem by editing channels.tsv:

Fpz
TP10
TP9

(this is what I believe happened in the fNIRS example, but I might be mistaken)

But as I said, it's ambiguous what to do here.

larsoner commented 1 year ago

(this is what I believe happened in the fNIRS example, but I might be mistaken)

I don't think so. A while ago we used to enforce writing fNIRS channels in a specific order. I'm guessing they wrote the channels.tsv back when we did that. Then at some point updated their raw data by rewriting when we didn't force the order to change. Hence the mismatch...

Yes, but think they have named the channels in raw ... and realized they swapped TP9 and TP10, they may want to fix the problem by editing channels.tsv

Coming back to the spec as posted above, this at least mentions the idea of an order mismatch:

https://bids-specification.readthedocs.io/en/latest/modality-specific-files/electroencephalography.html#channels-description-_channelstsv

Is there somewhere in the spec that says channels.tsv can be used to do a renaming? I did a quick search in that URL and didn't see anything about a renaming being allowed at least.

sappelhoff commented 1 year ago

Is there somewhere in the spec that says channels.tsv can be used to do a renaming? I did a quick search in that URL and didn't see anything about a renaming being allowed at least.

unfortunately there is still no official BIDS policy on whether the raw data or the BIDS metadata should be preferred in case of a mismatch. This is the relevant issue:

https://github.com/bids-standard/bids-specification/pull/761

Also, I am very unhappy that this sentence from the spec as linked by Eric above is a "SHOULD" and not a "MUST" 😕

To avoid confusion, the channels SHOULD be listed in the order they appear in the EEG data file.

larsoner commented 1 year ago

Mismatch in order of names, or in set of names? I see in the spec where the former could happen (the SHOULD stuff), but not where the sets of names could differ.

sappelhoff commented 1 year ago

my first comment pertains to mismatch in set names, my second comment pertains to the order only (where a mismatch is apparently "fine but not recommended")

larsoner commented 1 year ago

Yikes so there's no real way to know. Maybe we should raise an error but allow people to pass ch_name_mismatch='raise' (default) | 'reorder' | 'rename'. We could even just start with the reorder case as that's the one we actually need here, unless people already have seen datasets that need to rename one.

hoechenberger commented 1 year ago

Yikes so there's no real way to know. Maybe we should raise an error but allow people to pass ch_name_mismatch='raise' (default) | 'reorder' | 'rename'. We could even just start with the reorder case as that's the one we actually need here, unless people already have seen datasets that need to rename one.

+1

mne-tools / mne-bids