Closed timonmerk closed 3 years ago
it's due to the fif format that uses a fixed length for channel names.
it's quite hard to change.
But by not using the fif format (in our case we use brainvision) this should theoretically not be a problem right?
as you as you save a file to fif you have the pb.
We should still consider the option to get rid of these FIFF restrictions and only apply them once the users saves something to FIFF (where we could issue appropriate warnings etc).
I could live with that !
We should still consider the option to get rid of these FIFF restrictions and only apply them once the users saves something to FIFF
If we do this we might want to have a allow_long=False
by default in create_info
and in other readers where we've hit long channel names. Only finding out when you go to write that the file you will read in later won't have the same channel names as those you've been working with could be pretty annoying for users who might have done things with their data already. Or maybe it's okay because it's just one more rename_channels
that people will need?
it's quite hard to change.
The reason there is a limitation at all IIUC is beacuse the channel info struct is a fixed-size C struct. We could add a FIFFV_MNE_CHANNEL_NAMES_MAPPING
tag as a string JSON dump of the dict mapping short-to-original names that has the following behavior:
info['chs']
and info['bads']
names with the shortened ones; write the FIFFV_MNE_CHANNEL_NAMES_MAPPING
JSONFIFFV_MNE_CHANNEL_NAMES_MAPPING
JSON and use it to make the appropriate substitutions in info['chs']
and info['bads']
.In other words, we add a tag that says "do rename_channels(...) during writing and reading".
Old software (Xfit; other toolboxes, etc.) will still work because we still write the 15-char channel names. We can update MNE-MATLAB to respect this, too. The only time you're in trouble I think is if you take this data to some other software that does not support longer names (e.g., MNE-C 2.7.3) and they will probably not write out this tag. But users with this rare case can use rename_channels to get them back in that case.
If we do this we might want to have a
allow_long=False
by default increate_info
and in other readers where we've hit long channel names. Only finding out when you go to write that the file you will read in later won't have the same channel names as those you've been working with could be pretty annoying for users who might have done things with their data already. Or maybe it's okay because it's just one morerename_channels
that people will need?
I think it's okay if they get a clear warning message on save saying that channel names have been truncated because the file format doesn't support longer names.
Re extending the FIF format I don't really have an opinion because I don't use the format, but your suggestion sounds reasonable.
I think it's okay if they get a clear warning message on save saying that channel names have been truncated because the file format doesn't support longer names.
But… is it easy for users to write MNE data to other formats? As easy as doing raw.save('eeg_data.eeg')
or even epochs.save('eeg_data.eeg')
? (I've never tried that)
I guess I would be pretty annoyed if I've been working on some code and only at the end I'm adding the save()
calls, and then MNE tells me, "Good luck changing all the channel names now, sucker!"
Good point, we've discussed this at least once (too lazy to look up the issue), but in principle I agree that we'd have to put in some work to make exporting to other formats as straightforward as possible. Still, even if you get this message at the end of a pipeline MNE could automatically truncate channel names to 15 characters, which currently happens when loading a file that does support longer channels names.
I like the idea of either introducing a sidecar file or making writing to other formats easier, so we can overcome some of the limitations of the FIFF file format.
thanks for pinging me on this via https://github.com/fieldtrip/fieldtrip/issues/1596.
In principle, I do not oppose the idea of extending the FIF format with an extra table that maps the short names to the long names. However, there are other file formats that have a similar channel name length limitation. One that comes to mind is EDF (also 16 characters, see https://www.edfplus.info/specs/edf.html), but there are others as well.
An alternative to consider is to use the BIDS channels.tsv
file to store the complete channel names, or perhaps a lookup table, see https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/02-magnetoencephalography.html#channels-description-_channelstsv. It now has name, type and units, but additional columns could be added.
I have recently updated FieldTrip such that - if the code detects that it is a BIDS like dataset, e.g. a _meg.fif
or an _eeg.fif
with accompanying sidecar files - the channels.tsv
overrules the binary header information. See https://github.com/fieldtrip/fieldtrip/blob/34332821c9e8d9d6bf02103d34071d2784e909b3/fileio/ft_read_header.m#L302 where the table is read, and https://github.com/fieldtrip/fieldtrip/blob/34332821c9e8d9d6bf02103d34071d2784e909b3/fileio/ft_read_header.m#L2822 where the channel names from the binary header are replaced by those of the channels.tsv
.
I implemented it more with the idea of being able to change channel names after recording, e.g. giving channels "1", "2", ... proper names, or being able to swap two channels in the header in case two electrodes had to be swapped during the recording. But the use case described here would also be covered.
Hello @robertoostenveld,
thank you for sharing your thoughts!
I have recently updated FieldTrip such that - if the code detects that it is a BIDS like dataset, e.g. a
_meg.fif
or an_eeg.fif
with accompanying sidecar files - thechannels.tsv
overrules the binary header information.
You're actually touching upon something that has been a constant source of friction for @agramfort and me with BIDS: the requirement that the BIDS sidecar files be in sync with the meta info saved in the recording's data files. The main argument being that one should still be able to use the data with non-BIDS-aware software and still get the correct meta info. (A notion with which I disagree)
I personally believe that the sidecar info should always take precedence, and this would immediately get rid of some of the limitations we're seeing e.g. with the FIFF standard.
What is your stance on this? Do you think it's worthwhile trying to adjust BIDS such that, if sidecar info is present, it would always override respective meta info in the data files?
Because strictly speaking, my understanding is that with the change to FieldTrip you mentioned, you're not following the current BIDS specification anymore.
You're actually touching upon something that has a constant source of friction for @agramfort https://github.com/agramfort and me with BIDS: the requirement (?) that the BIDS sidecar files be in sync with the meta info saved in the recording's data files. The main argument being that one should still be able to use the data with non-BIDS-aware software and still get the correct meta info. (A notion with which I disagree)
yes for me the sidecar files should take precedence otherwise it means we cannot share the raw files without rewriting them which is always a bit dangerous.
BIDS is not a file format specification. I don't think you should hack your way through BIDS ... file formats need their independent treatment.
relevant discussion!
I also believe that BIDS metadata should take precedence over metadata in the original files because
The specific case (longer/other channel names) falls under (1).
An example of (2) is that the bold.nii.gz
does not specify in the header what the experimental task was. Or many of the DICOM parameters.
An example of (3) is that numeric trigger codes - which are a silly technical restriction of our EEG/MEG systems because we are used to using 8/16/32 bit TTL coupling of our acquisition systems with external devices - need to be recoded into something more meaningful. That is what in FT we normally do with a "trialfun" upon the moment of analysis, but which in the case of BIDS curatiuon can be done earlier: at the stage of data management. An example of this is https://github.com/DidiLamers/Data2bids-Scripts/tree/main/Prestimulus%20Theta%20Flicker, where @DidiLamers (after starting naively in the cognitive EEG field without BIDS and FT experience until a few weeks ago) came up with the smart idea of implementing a trialfun for data2bids.
An other example of (3) would be computing reaction times, or recoding sequences of triggers (stim-response mappings as correct/incorrect) and adding those to the events.tsv
. Our integer EEG TTL codes do not have this information explicitly but often implicitly: you need to know the meaning of each trigger number and some code to parse the sequence. However the presentation software might have determined this online and written it to the log file. So merging the presentation log file and the integer TTL codes is also something to do in BIDS curation; in that case you would not write the reaction times to the EEG or MEG stimulus channel (it would not even fit).
I agree that rewriting the binary data files to ensure consistency with the sidecars is preferred and makes the subsequent use of the data more robust. But for many of the BIDS data files we don't have easy means to rewrite them, for example for EEGLAB .set or NWB files. And sometimes it simply does not fit, as in the 16-bit string limitation for fif.
BIDS is meant to fix some problems that ideally we would not have had in the first place. But that does not mean that we are required to back-port the fixes realized by BIDS to the original data formats. Since in that case we should also be considering using the NIFTI-2 header extensions to store DICOM fields etc, or switching to CITFI (which is NIFTI with header extensions), or rewrite each EDF file into an EDFplus, etc.
ok so we agree that they should take precedence. I think it's an important message to the community.
are these expectations clarified somewhere in the specification? For NIFTI, doesn't the validator check that sidecar metadata matches exactly the header info? (I might be wrong or the information might be outdated)
MNE automatically truncates the channel name length to 15 characters:
In the context of clinical neurophysiology it becomes very handy to keep information like
inside of the channel name.
Is there a reason for limiting the length?