mne-tools / fiff-constants

Bookkeeping and documentation of FIFF file format constants
4 stars 12 forks source link

ENH: Add mne_ch_name_mapping #29

Closed larsoner closed 3 years ago

larsoner commented 3 years ago

@jnenonen this PR adds a new tag to the MNE namespace to allow channel names longer than 15 characters by giving a JSON dictionary mapping between 15-char and arbitrary-length names. But before doing that, I thought I'd check to see if MEGIN folks would be interested in using this sort of functionality, in which case it could live in the main namespace. Quoting the MNE issue:

The reason there is a limitation at all IIUC is beacuse the channel info struct is a fixed-size C struct. We could add a FIFFV_MNE_CHANNEL_NAMES_MAPPING tag as a string JSON dump of the dict mapping short-to-original names that has the following behavior:

  1. When writing, (temporarily) replace info['chs'] and info['bads'] names with the shortened ones; write the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON
  2. When reading, check for the presence of the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON and use it to make the appropriate substitutions in info['chs'] and info['bads'].

In other words, we add a tag that says "do rename_channels(...) during writing and reading".

In theory this could be implemented in MEGIN C programs, too, but I assume it would be more of a pain because of structs/types/assumptions about these channel names.

Any interest in having this in the main DictionaryTags (e.g., if you think you would use it someday), or should we just keep it here in DictionaryTags_MNE.txt ?

agramfort commented 3 years ago

@jnenonen we need you to approve or question this.

jnenonen commented 3 years ago

Hi,

Looks OK, but I would like to leave the decision to Matti Kajola.

BR, Jukka

[A screenshot of a cell phone Description automatically generated]--

From: Eric Larson notifications@github.com Reply to: mne-tools/fiff-constants reply@reply.github.com Date: Thursday 19. November 2020 at 14.59 To: mne-tools/fiff-constants fiff-constants@noreply.github.com Cc: Jukka Nenonen jukka.nenonen@megin.fi, Mention mention@noreply.github.com Subject: [mne-tools/fiff-constants] ENH: Add mne_ch_name_mapping (#29)

@jnenonenhttps://github.com/jnenonen this PR adds a new tag to the MNE namespace to allow channel names longer than 15 characters by giving a JSON dictionary mapping between 15-char and arbitrary-length names. But before doing that, I thought I'd check to see if MEGIN folks would be interested in using this sort of functionality, in which case it could live in the main namespace. Quoting the MNE issuehttps://github.com/mne-tools/mne-python/issues/8312#issuecomment-700249945:

The reason there is a limitation at all IIUC is beacuse the channel info struct is a fixed-size C struct. We could add a FIFFV_MNE_CHANNEL_NAMES_MAPPING tag as a string JSON dump of the dict mapping short-to-original names that has the following behavior:

  1. When writing, (temporarily) replace info['chs'] and info['bads'] names with the shortened ones; write the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON
  2. When reading, check for the presence of the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON and use it to make the appropriate substitutions in info['chs'] and info['bads'].

In other words, we add a tag that says "do rename_channels(...) during writing and reading".

In theory this could be implemented in MEGIN C programs, too, but I assume it would be more of a pain because of structs/types/assumptions about these channel names.

Any interest in having this in the main DictionaryTags (e.g., if you think you would use it someday), or should we just keep it here in DictionaryTags_MNE.txt ?


You can view, comment on, or merge this pull request online at:

https://github.com/mne-tools/fiff-constants/pull/29

Commit Summary

File Changes

Patch Links:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mne-tools/fiff-constants/pull/29, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADR3D6QPXBWSGZ5HH7UOZ6DSQUJB7ANCNFSM4T3LWVLQ.

larsoner commented 3 years ago

@mkajola sounds like it's up to you!

mkajola commented 3 years ago

Hi, This kind of solution is of course possible, but probably will not be the Megin solution when it will be needed. Embedded json is probably practical for python, but not necessarily for other platforms, and it kind of breaks the basic idea of the fiff file. We already have similar kind of formal issue with the acquisition parameters, which are essentially a dump of internals of one program, but one that have been started to be used, then it becomes like part of the file format, while it clearly is not in the specification, and at the same time it binds the acquisition software design (in our case new systems produce that kind of entry, even though there no such variables in the software). One also needs multiple parsers to read the files.

The idea that has been in the air for a very long time, has been to change the rigid channel info records to normal fiff-blocks. I think that even the block tags are there from some very distant past. Then the channel info would be as flexible as anything else, and ne could add arbitrary channel information (descriptions, colors, whatever) for each channel. This would be especially valuable for processed output files, and contexts other than data from out MEG devices. For backwards compatibility, the ch-info records could still be present. One small question in this respect, is whether the records should be inside or outside the ch-info blocks. If they are inside, correctly functioning reader that is aware of the tag context should actually ignore it, so that could broke some of the readers (most of the readers tend to be quite sloppy in this respect, so most likely the damage would be surprisingly small). So possibly better to be outside, parallel to blocks.

The contents of the channel info block should also be specified rather carefully. One of the questions is, that if they are present, should all the info in the ch-info records be also there, so that they can be dropped away at some phase when most of the readers can handle the new structures. I suppose for overlapping information the block would have precedence. The ch-info structure is very much channel counting/labeling scheme in Neuromag-122 from last millennium, so it probably would be time to think what is really needed there. Possibly the old scheme is ok, but if the encoding is changed, then one could think a bit about the contents too.

What is the actual need here? Is it just a longer name for the channel. Then no new logical info would be in the file, just encoded differently. In some contexts, there seem to be also need to have e.g. user defined labels etc. which are necessarily not the same thing as the ‘name’ of the channel. One example where we have been bit indecisive, is whether the EEG labels should the ‘channel names’, or should there be both the ‘electrode location’, and the ‘physical channel’ names present. The exact purposes of the data items should be specified.

Even bigger question about all kinds of names is the string encoding. Current fiff spec requires isolatin 1, though on this century Unicode is the way to go.

t. mjk

From: Eric Larson notifications@github.com Sent: torstai 19. marraskuuta 2020 16.03 To: mne-tools/fiff-constants fiff-constants@noreply.github.com Cc: Matti Kajola matti.kajola@megin.fi; Mention mention@noreply.github.com Subject: Re: [mne-tools/fiff-constants] ENH: Add mne_ch_name_mapping (#29)

@mkajolahttps://github.com/mkajola sounds like it's up to you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mne-tools/fiff-constants/pull/29#issuecomment-730395946, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKJMTHBJ6EN4CLZNENHLKLLSQUQQLANCNFSM4T3LWVLQ.

larsoner commented 3 years ago

Embedded json is probably practical for python, but not necessarily for other platforms... it kind of breaks the basic idea of the fiff file... One also needs multiple parsers to read the files.

For what it's worth, we chose JSON because of its adoption across multiple languages -- a quick search turned up several possible C parsers (e.g., this one that is MIT/commercial-compatible single source and single header file) and the same goes for other languages. But it is indeed more overhead than using FIF tags directly, as you say, and is really just a quick workaround/band-aid solution to use a single string tag to encode what typically would be encoded using multiple tags (via a new block type, etc.).

What is the actual need here? Is it just a longer name for the channel. Then no new logical info would be in the file, just encoded differently. In some contexts, there seem to be also need to have e.g. user defined labels etc. which are necessarily not the same thing as the ‘name’ of the channel... on this century Unicode is the way to go... The idea that has been in the air for a very long time, has been to change the rigid channel info records to normal fiff-blocks.

Channel-name-length is the most pressing need, but it would be great to encode other information in a more flexible manner, support unicode, etc. as these are also limitations of the current format. A new "channel block" scheme that takes precedence over the old "channel struct" entries is definitely a better solution!

To me the question is, what do you see as a potential timeline for finalization and adoption of this sort of scheme? If we knew that this proposal could be discussed, finalized, and implemented within 6 months, there wouldn't be any point in us implementing the band-aid JSON approach. On the other hand, if we knew it take 6 years, then the JSON band-aid would have value as a short/medium-term workaround because it solves a real problem for our users in the meantime.

Proposal

To try to keep the proposed "channel block" solution toward the <= 6-month end of the spectrum, how about we start small with only the components we need within a framework we can expand later as we see a need for new tags? Concretely, we need to keep the channel info struct for backward compatibility, so there is no reason to replicate the entries already in there. So let's create a new block and for now only add the types that would take care of two concrete problems -- channel name length, and (optionally) unicode encoding:

  1. Create a new FIFFB_CH_BLOCK: the number of these blocks should match FIFF_NCHAN, and it should live within FIFFB_MEAS_INFO
  2. Add a tag FIFF_CH_INFO_NAME that links the given channel block to its channel info struct
  3. Add a tag FIFF_CH_NAME that is a channel name string of arbitrary length
  4. (Optional) Add a new FIFF.FIFFT_UNICODE that indicates that the tag is a unicode string

Step (4) seems like the most annoying to implement in C, so it could be omitted and implemented later -- LATIN-1 strings could be used for now. To represent this visually with how it would show up in mne show_fiff, it would be something like:

999 = FIFFB_ROOT
    ...
    100 = FIFFB_MEAS
        103 = FIFF_BLOCK_ID (20b ids) = {'version': 65537, 'machid': a ... dict len=4
        101 = FIFFB_MEAS_INFO
            ...
            200 = FIFF_NCHAN (4b >i4) = [376]
            x376: 203 = FIFF_CH_INFO (96b cis) = {'scanno': 376, 'logno': 61, ' ... dict len=11
            000 = FIFFB_CH_BLOCK
                000 = FIFF_CH_INFO_NAME (8b str) = 'MEG 0111'
                000 = FIFF_CH_NAME (19b unicode) = 'MEG 0111 ⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️'
            000 = FIFFB_CH_BLOCK
                000 = FIFF_CH_STRUCT_NAME (8b) str = 'MEG 0112'
                ...
            ...

Does this sort of thing seem like a reasonable starting point? I think it checks all the boxes you mention above @mkajola and it should be easy enough for us to implement at the MATLAB/Python end, and definitely seems better from a long-term perspective.

larsoner commented 3 years ago

(FYI I edited my GitHub comment to fix name consistency in the proposal, so if you're looking at my replies over email it's worth going on GitHub to see the up-to-date version of the proposal. Also the tag/block names and 000's are just placeholders -- if we agree this is a good way to move forward, I can open a PR with some proposed names and we can iterate there!)

agramfort commented 3 years ago

I can only voice my support to a solution over the next few months. We have more and more users hitting the limitations of fif when working with different input formats that do not have the same restrictions. It's becoming problematic for a wider adoption of mne. my 2c

mkajola commented 3 years ago

I think that this would be useful thing, so lets see if we could realize this. That would be then in version 1.5. (Do we have the Manual somewhere in github, or only the tag directory text file). Unicode should then be 2.x later on, since touching the structures/primary types will not be backwards compatible. The tags for the fields exists already - tags 250 - 258. These seems to be no block for this (I suspect that the hole at 113 might have been it), so that needs to be specified. The most interesting part here is to check how many readers we break with this. But to know that, I suppose we first need to create some files and test that, I think we can do that here for all (well at least for the most important) Megin software.

What would be the correct talking/specification environment for this. I suppose here we are approving/disapproving the json workaround. For me that's fine if it is in the MNE reserved area. This question applies also the other restrictions in the format. It would be good to collect list of those so that it would be possible to evaluate if there is a good way to change FIFF of should there be some entirely different way to go. In the same arena we probably should discuss also how to handle the different profiles of fiff usage. Currently specifying that one supports a particular version is bit fuzzy. Whether it means the Megin spec, or the extended one. Starts to resemble DICOM.

larsoner commented 3 years ago

What would be the correct talking/specification environment for this.

I opened a PR that is an alternative to / supersedes this one: https://github.com/mne-tools/fiff-constants/pull/33. Let's discuss there as people on GitHub can leave comments directly on the code in line, comment on the diff, etc. If we merge that PR we'll close this JSON band-aid one as it's redundant and a less good solution.

Do we have the Manual somewhere in github, or only the tag directory text file

Just text -- let's discuss further in https://github.com/mne-tools/fiff-constants/issues/31, it would be great to have the build automated.

This question applies also the other restrictions in the format. It would be good to collect list of those so that it would be possible to evaluate if there is a good way to change FIFF of should there be some entirely different way to go.

Opened an issue to discuss as people have ideas: https://github.com/mne-tools/fiff-constants/issues/30

larsoner commented 3 years ago

Closing this as we hopefully won't need it!