mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.68k stars 1.31k forks source link

Why is the channel name length limited to 15 characters? #8312

Closed timonmerk closed 3 years ago

timonmerk commented 4 years ago

MNE automatically truncates the channel name length to 15 characters:


<ipython-input-68-1c3debadf790>:1: RuntimeWarning: 4 channel names are too long, have been truncated to 15 characters:
['LFP_BS_STN_R_234', 'LFP_BS_STN_R_567', 'LFP_BS_STN_L_234', 'LFP_BS_STN_L_567']
  info = mne.create_info(ch_labels, fsample, ch_types='ecog')

In the context of clinical neurophysiology it becomes very handy to keep information like

  1. the cortical location
  2. electrode manufacturer
  3. hemisphere

inside of the channel name.

Is there a reason for limiting the length?

agramfort commented 4 years ago

it's due to the fif format that uses a fixed length for channel names.

it's quite hard to change.

timonmerk commented 4 years ago

But by not using the fif format (in our case we use brainvision) this should theoretically not be a problem right?

agramfort commented 4 years ago

as you as you save a file to fif you have the pb.

cbrnr commented 4 years ago

We should still consider the option to get rid of these FIFF restrictions and only apply them once the users saves something to FIFF (where we could issue appropriate warnings etc).

agramfort commented 4 years ago

I could live with that !

larsoner commented 4 years ago

We should still consider the option to get rid of these FIFF restrictions and only apply them once the users saves something to FIFF

If we do this we might want to have a allow_long=False by default in create_info and in other readers where we've hit long channel names. Only finding out when you go to write that the file you will read in later won't have the same channel names as those you've been working with could be pretty annoying for users who might have done things with their data already. Or maybe it's okay because it's just one more rename_channels that people will need?

it's quite hard to change.

The reason there is a limitation at all IIUC is beacuse the channel info struct is a fixed-size C struct. We could add a FIFFV_MNE_CHANNEL_NAMES_MAPPING tag as a string JSON dump of the dict mapping short-to-original names that has the following behavior:

  1. When writing, (temporarily) replace info['chs'] and info['bads'] names with the shortened ones; write the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON
  2. When reading, check for the presence of the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON and use it to make the appropriate substitutions in info['chs'] and info['bads'].

In other words, we add a tag that says "do rename_channels(...) during writing and reading".

Old software (Xfit; other toolboxes, etc.) will still work because we still write the 15-char channel names. We can update MNE-MATLAB to respect this, too. The only time you're in trouble I think is if you take this data to some other software that does not support longer names (e.g., MNE-C 2.7.3) and they will probably not write out this tag. But users with this rare case can use rename_channels to get them back in that case.

cbrnr commented 4 years ago

If we do this we might want to have a allow_long=False by default in create_info and in other readers where we've hit long channel names. Only finding out when you go to write that the file you will read in later won't have the same channel names as those you've been working with could be pretty annoying for users who might have done things with their data already. Or maybe it's okay because it's just one more rename_channels that people will need?

I think it's okay if they get a clear warning message on save saying that channel names have been truncated because the file format doesn't support longer names.

Re extending the FIF format I don't really have an opinion because I don't use the format, but your suggestion sounds reasonable.

hoechenberger commented 4 years ago

I think it's okay if they get a clear warning message on save saying that channel names have been truncated because the file format doesn't support longer names.

But… is it easy for users to write MNE data to other formats? As easy as doing raw.save('eeg_data.eeg') or even epochs.save('eeg_data.eeg')? (I've never tried that) I guess I would be pretty annoyed if I've been working on some code and only at the end I'm adding the save() calls, and then MNE tells me, "Good luck changing all the channel names now, sucker!"

cbrnr commented 4 years ago

Good point, we've discussed this at least once (too lazy to look up the issue), but in principle I agree that we'd have to put in some work to make exporting to other formats as straightforward as possible. Still, even if you get this message at the end of a pipeline MNE could automatically truncate channel names to 15 characters, which currently happens when loading a file that does support longer channels names.

hoechenberger commented 4 years ago

I like the idea of either introducing a sidecar file or making writing to other formats easier, so we can overcome some of the limitations of the FIFF file format.

robertoostenveld commented 3 years ago

thanks for pinging me on this via https://github.com/fieldtrip/fieldtrip/issues/1596.

In principle, I do not oppose the idea of extending the FIF format with an extra table that maps the short names to the long names. However, there are other file formats that have a similar channel name length limitation. One that comes to mind is EDF (also 16 characters, see https://www.edfplus.info/specs/edf.html), but there are others as well.

An alternative to consider is to use the BIDS channels.tsv file to store the complete channel names, or perhaps a lookup table, see https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/02-magnetoencephalography.html#channels-description-_channelstsv. It now has name, type and units, but additional columns could be added.

I have recently updated FieldTrip such that - if the code detects that it is a BIDS like dataset, e.g. a _meg.fif or an _eeg.fif with accompanying sidecar files - the channels.tsv overrules the binary header information. See https://github.com/fieldtrip/fieldtrip/blob/34332821c9e8d9d6bf02103d34071d2784e909b3/fileio/ft_read_header.m#L302 where the table is read, and https://github.com/fieldtrip/fieldtrip/blob/34332821c9e8d9d6bf02103d34071d2784e909b3/fileio/ft_read_header.m#L2822 where the channel names from the binary header are replaced by those of the channels.tsv.

I implemented it more with the idea of being able to change channel names after recording, e.g. giving channels "1", "2", ... proper names, or being able to swap two channels in the header in case two electrodes had to be swapped during the recording. But the use case described here would also be covered.

hoechenberger commented 3 years ago

Hello @robertoostenveld,

thank you for sharing your thoughts!

I have recently updated FieldTrip such that - if the code detects that it is a BIDS like dataset, e.g. a _meg.fif or an _eeg.fif with accompanying sidecar files - the channels.tsv overrules the binary header information.

You're actually touching upon something that has been a constant source of friction for @agramfort and me with BIDS: the requirement that the BIDS sidecar files be in sync with the meta info saved in the recording's data files. The main argument being that one should still be able to use the data with non-BIDS-aware software and still get the correct meta info. (A notion with which I disagree)

I personally believe that the sidecar info should always take precedence, and this would immediately get rid of some of the limitations we're seeing e.g. with the FIFF standard.

What is your stance on this? Do you think it's worthwhile trying to adjust BIDS such that, if sidecar info is present, it would always override respective meta info in the data files?

Because strictly speaking, my understanding is that with the change to FieldTrip you mentioned, you're not following the current BIDS specification anymore.

agramfort commented 3 years ago

You're actually touching upon something that has a constant source of friction for @agramfort https://github.com/agramfort and me with BIDS: the requirement (?) that the BIDS sidecar files be in sync with the meta info saved in the recording's data files. The main argument being that one should still be able to use the data with non-BIDS-aware software and still get the correct meta info. (A notion with which I disagree)

yes for me the sidecar files should take precedence otherwise it means we cannot share the raw files without rewriting them which is always a bit dangerous.

jasmainak commented 3 years ago

BIDS is not a file format specification. I don't think you should hack your way through BIDS ... file formats need their independent treatment.

robertoostenveld commented 3 years ago

relevant discussion!

I also believe that BIDS metadata should take precedence over metadata in the original files because

  1. the metadata has to be corrected
  2. the metadata is missing, possibly due to binary format limitations
  3. the metadata that is present has to be extended

The specific case (longer/other channel names) falls under (1).

An example of (2) is that the bold.nii.gz does not specify in the header what the experimental task was. Or many of the DICOM parameters.

An example of (3) is that numeric trigger codes - which are a silly technical restriction of our EEG/MEG systems because we are used to using 8/16/32 bit TTL coupling of our acquisition systems with external devices - need to be recoded into something more meaningful. That is what in FT we normally do with a "trialfun" upon the moment of analysis, but which in the case of BIDS curatiuon can be done earlier: at the stage of data management. An example of this is https://github.com/DidiLamers/Data2bids-Scripts/tree/main/Prestimulus%20Theta%20Flicker, where @DidiLamers (after starting naively in the cognitive EEG field without BIDS and FT experience until a few weeks ago) came up with the smart idea of implementing a trialfun for data2bids.

An other example of (3) would be computing reaction times, or recoding sequences of triggers (stim-response mappings as correct/incorrect) and adding those to the events.tsv. Our integer EEG TTL codes do not have this information explicitly but often implicitly: you need to know the meaning of each trigger number and some code to parse the sequence. However the presentation software might have determined this online and written it to the log file. So merging the presentation log file and the integer TTL codes is also something to do in BIDS curation; in that case you would not write the reaction times to the EEG or MEG stimulus channel (it would not even fit).

I agree that rewriting the binary data files to ensure consistency with the sidecars is preferred and makes the subsequent use of the data more robust. But for many of the BIDS data files we don't have easy means to rewrite them, for example for EEGLAB .set or NWB files. And sometimes it simply does not fit, as in the 16-bit string limitation for fif.

BIDS is meant to fix some problems that ideally we would not have had in the first place. But that does not mean that we are required to back-port the fixes realized by BIDS to the original data formats. Since in that case we should also be considering using the NIFTI-2 header extensions to store DICOM fields etc, or switching to CITFI (which is NIFTI with header extensions), or rewrite each EDF file into an EDFplus, etc.

agramfort commented 3 years ago

ok so we agree that they should take precedence. I think it's an important message to the community.

jasmainak commented 3 years ago

are these expectations clarified somewhere in the specification? For NIFTI, doesn't the validator check that sidecar metadata matches exactly the header info? (I might be wrong or the information might be outdated)