API: Remove raw.impedances

larsoner commented 3 weeks ago

See discussion in https://github.com/mne-tools/mne-python/issues/12861#issuecomment-2361102202

larsoner commented 2 weeks ago

Argh @sappelhoff I went to look at pybv and the code there seems to have to do almost exclusively with writing, not reading BV data.

@sappelhoff WDYT about just keeping the code in MNE-Python with a mne.io.brainvision.read_impedances(...)?

drammock commented 2 weeks ago

obligatory cross-ref to https://github.com/mne-tools/mne-python/pull/12450#issuecomment-1967238036, in particular

If the desired end-state here is "cleaner separation of concerns" / "pybv only handles brainvision stuff, not MNE stuff" then I think there's a strong case that pybv should absorb MNE-Python's brainvision reading code

larsoner commented 2 weeks ago

Oh okay with me then!

drammock commented 2 weeks ago

Oh okay with me then!

unfortunately my suggestion in https://github.com/mne-tools/mne-python/pull/12450#issuecomment-1967238036 was not met with enthusiasm. But maybe now the case is stronger / more pursuasive? WDYT @cbrnr @sappelhoff?

cbrnr commented 2 weeks ago

I'd be OK with adding reader functionality to pybv, it does make a lot of sense. However, this would be the first format (I think) where we choose to do so, so why is this a problem suddenly? If the current reader function does read impedances (if present), and the only reason we're now removing it is because it doesn't survive a round-trip to FIFF, then IMO the problem is that we're still relying on FIFF. I'm pretty sure we've had this discussion before, but no one outside the small MEG community has ever used FIFF. It's a very niche file format that doesn't even have many readers outside of MNE. So why don't we think about moving to a new internal format that is independent of the FIFF idiosyncrasies, such as HDF? I think it could be as easy as dumping our current Raw into HDF, and then we'd be free to extend attributes even if FIFF does not support it.

Edit 1: ~~It's not the first format, since we're using antio for both reading and writing.~~ Edit 2: ~~Not true, even antio is using a reader from MNE.~~ Edit 3: OK, it's complicated 😄. The MNE reader needs antio, and antio has a reader that needs MNE... 🤔

cbrnr commented 2 weeks ago

PS: We're rolling our own EDF reader, even though edfio includes a reader and a writer. I'm absolutely not saying it is a bad thing that we implement our own readers (after all, we get to decide how strict they follow a given standard, which is not always desirable, etc.), so for me readers are not really up for discussion. It's the FIFF format that should be swapped with something better.

mscheltienne commented 2 weeks ago

IMO the problem is that we're still relying on FIFF. I'm pretty sure we've had this discussion before, but no one outside the small MEG community has ever used FIFF. It's a very niche file format that doesn't even have many readers outside of MNE. So why don't we think about moving to a new internal format that is independent of the FIFF idiosyncrasies, such as HDF?

Agree, but that would be a major and complicated refactor to do.

sappelhoff commented 2 weeks ago

Argh @sappelhoff I went to look at pybv and the code there seems to have to do almost exclusively with writing, not reading BV data.

Yes, that is true. I thought we'd want to very selectively outsource impedance reading to pybv, which would not be so hard to do. But it is true that is would be a bit odd that no reading is supported EXCEPT for impedances

@sappelhoff WDYT about just keeping the code in MNE-Python with a mne.io.brainvision.read_impedances(...)?

would be fine with me, too -- as long as we retain SOME way to get the impedances.

unfortunately my suggestion in https://github.com/mne-tools/mne-python/pull/12450#issuecomment-1967238036 was not met with enthusiasm. But maybe now the case is stronger / more pursuasive? WDYT

I would be fine with having pybv take over READING, as long as it supplies the read data in a "non-MNE" format (basic Python and numpy data structures), to stay "lightweight/minimal", and then have code in mne that turns these basic data structures into mne-ready structures.

However, it would require some work and effort that I cannot commit to at this point.

IMO the problem is that we're still relying on FIFF. I'm pretty sure we've had this discussion before, but no one outside the small MEG community has ever used FIFF. It's a very niche file format that doesn't even have many readers outside of MNE. So why don't we think about moving to a new internal format that is independent of the FIFF idiosyncrasies, such as HDF?

I agree with Clemens and Mathieu, the FIFF format often places an unnecessary limit on our plans of supporting more (or more detailed) (meta-) data. I am also aware that a potential change of the "saving backend" of mne would probably be the biggest change (labor intensive, error prone, ... but also valuable, and future-oriented) MNE-Python has seen in its current lifetime.

cbrnr commented 5 days ago

I agree with Clemens and Mathieu, the FIFF format often places an unnecessary limit on our plans of supporting more (or more detailed) (meta-) data. I am also aware that a potential change of the "saving backend" of mne would probably be the biggest change (labor intensive, error prone, ... but also valuable, and future-oriented) MNE-Python has seen in its current lifetime.

It will be a big change, yes, but I'm not even sure if it's going to be that problematic if we start by just dumping what we have into an HDF5 container. There are of course some details to this process, but if people think this would be worth discussing, I suggest that we open a new issue.

Regarding the removal of impedances, I think there's not really a strong need to do that, or is there? Do we document which attributes survive a round trip through FIFF? If raw.impendances cannot/should not be used, then what about at least providing it in raw._extras (at least as long as FIFF remains the standard format)?

larsoner commented 4 days ago

I agree with Clemens and Mathieu, the FIFF format often places an unnecessary limit on our plans of supporting more (or more detailed) (meta-) data. I am also aware that a potential change of the "saving backend" of mne would probably be the biggest change (labor intensive, error prone, ... but also valuable, and future-oriented) MNE-Python has seen in its current lifetime.

First I'll say that this FIF limitation has come up before previously, and the consensus/decision that I recall was that we should have mne.export for other formats. So I'd rather not work through all those discussions if possible. I'd like to stick with raw.save meaning "to FIF" and mne.export be for everything else.

And to add a different perspective, I see the problem a bit differently. The FIFF spec can change, and we change it when needed. Often the limitation and/or difficulty has more to do with defining exactly the interface / structure we want to see in the data. Then we amend the FIF spec, and we can immediately use it. We could do this with impedances if we wanted. But so far there hasn't been a huge demand for this so it didn't seem totally necessary to me at least :shrug: So in some sense, I think if we want to store more data, we can. We need to do the hard work up front of deciding exactly the types of new data we want stored, etc. This is why BIDS and BEPs take forever, and it's hard work. Once we do that, we can always extend FIF.

One huge (and I think it is very important) advantage that sticking with FIFF that I think people too easily overlook is the uniformity it has provided in terms of an interface. In each reader we have worked to get each data format into a common class (BaseRaw) that has common metadata (Info) with well defined behavior (or at least we have been making it validate better over time!). In some sense this is the opposite of HDF5, where you can put in whatever you want (more or less). So while entwining our Raw and Info classes with FIF is a limitation, it has been a very useful one as well.

EDIT: And all this doesn't even address the likely quite large number of person-hours required for such a fundamental change as @mscheltienne and @sappelhoff have brought up!

cbrnr commented 4 days ago

First I'll say that this FIF limitation has come up before previously, and the consensus/decision that I recall was that we should have mne.export for other formats. So I'd rather not work through all those discussions if possible. I'd like to stick with raw.save meaning "to FIF" and mne.export be for everything else.

Fair enough, but this is hopefully not set in stone, so I'd rather discuss it again from time to time. This means it is bothering people enough apparently that it might be worth reconsidering.

Your remaining arguments miss the point I think. We can also have a very well defined interface even if we switch to HDF5. The actual storage format is just an implementation detail, and I'd rather have a standardized format like HDF5 than a very specific format that basically only MNE can read. But yes, if you're saying we can adapt the standard, I'd vote for adding an impedance field to FIFF then.

Regarding implementation effort, of course someone has to invest the time, but depending on how flexible HDF5 is, I imagine that it will be possible to dump our current format into HDF5, which shouldn't be that big of a deal.

cbrnr commented 4 days ago

Also, since FIFF is well specified, where can I find the specifications if I wanted to implement a FIFF reader or writer from scratch?

larsoner commented 2 days ago

Fair enough, but this is hopefully not set in stone, so I'd rather discuss it again from time to time. This means it is bothering people enough apparently that it might be worth reconsidering.

Of course nothing is set in stone, that would not be good! But on the other hand, for bigger decisions like this, typically multiple people spend a lot of time (days? weeks? months?) discussing, thinking, and coming to a consensus and implementing the agreed upon solution. I think it's critical to take that into account, and only if there is quite strong support and evidence that the current solution is problematic reverse the decision. Gathering this evidence would require looking back into all the old discussions, PRs, and issues as much as possible and understanding all the points there (which isn't clear has been done yet?), and why they are convincingly, sufficiently wrong or problematic now that we should change. If every time we make a big decision, a year or two later someone goes "I (still) don't like that decision" and we are required to rehash all the old discussions and pros and cons, I don't think it's a good use of project time or effort, or really fair to the people who put in that time and effort to have the original discussions. Yes if it's problematic let's fix it but if it's not clearly wrong let's move on to other stuff!

For FIFF specification I wouldn't claim it's super well documented like some other formats like EDF. I think the info lives to some extent in fiff-constants but also (maybe moreso) in code in MNE-Python and MNE-MATLAB.

sappelhoff commented 2 days ago

For FIFF specification I wouldn't claim it's super well documented like some other formats like EDF. I think the info lives to some extent in fiff-constants but also (maybe moreso) in code in MNE-Python and MNE-MATLAB.

Out of curitosity, and because it is relevant to this discussion: Is FIFF not a vendor specific format by Elekta / Neuromag / MEGIN (in historical order of company names)? If we make changes to the "FIFF spec", are we talking about some official file format specification that is then also respected by every other user? Or are we talking about an mne specific fork of the FIFF spec?

I cannot find traces of it anymore, but I remember there being a discussion on how MNE-Python-produced FIF files may not always be read by FIF readers that CAN read "out-of-the-machine-FIFs".

larsoner commented 2 days ago

If we make changes to the "FIFF spec", are we talking about some official file format specification that is then also respected by every other user? Or are we talking about an mne specific fork of the FIFF spec?

We now co-maintain the spec officially at https://github.com/mne-tools/fiff-constants. I think it's what the E/N/M folks are/should be using nowadays, and they help co-maintain it. Since our org produces the MNE-Python and MNE-MATLAB code that other open-source packages use to read FIF data, we generally update our packages when things are added to the spec. So in principle yes it is the source of truth that should be respected by other users!

I think there have been times we produced FIF files that were problematic, but at least some of those I think turned out to be bugs we actually fixed. I do know people still use MNE-Python produced -ave.fif and _raw.fif files with E/N/M software like Xfit and maxfilter™ successfully for example. Ideally whatever we write should be compatible with other FIF-reading software, even if it doesn't use our Python or MATLAB code.

cbrnr commented 2 days ago

@larsoner I've never seen this discussed here, except maybe in #5302, but there hasn't been any definitive answer.

Again, FIFF is a proprietary format, and maintaining just some constants is not a file format documentation. I don't think that an open-source package should be based on a proprietary format.

Since I've received no answer, let me ask directly: Why do you think switching to HDF5 to dump our current raw structure would be so much work? We don't change anything, just the file format.

Lastly, we need to find a solution for the raw.impedances issue. I think we should not remove this functionality just because FIFF doesn't support it.

larsoner commented 2 days ago

@larsoner I've never seen this discussed here, except maybe in #5302, but there hasn't been any definitive answer.

You should look more broadly into discussions of raw.save vs mne.export. I would start from the git blame of mne.export and you'll see some original PRs which should then link to discussions hopefully. I know some of the discussions were not on GitHub and instead in person (possibly at sprints?) but at least some of them should be discoverable still.

Why do you think switching to HDF5 to dump our current raw structure would be so much work? We don't change anything, just the file format.

I don't think that part would be a ton of work, but I do think based on previous discussions that if we do it it should live in mne.exports. And I haven't seen much (any?) demand for being able to write data to HDF5 this way. I assume people mostly use pymatreader for compat with MATLAB mostly. Critically I am not sure what other software (besides MNE if we wrote a mne.io.read_raw_h5) would read this output, so from a compat standpoint I don't think it gains much.

Lastly, we need to find a solution for the raw.impedances issue. I think we should not remove this functionality just because FIFF doesn't support it.

I think we should either add it to the FIF spec, which would be okay, or instead (easier, seems okay to me, agreed upon previously by all others I thought?) move it to pybv and antio for now. And if users clamor for this info to be stored more permanently and is more widely available in other formats (which I don't know?) then we move to support it in FIF, and in mne.io.Raw concurrently.

cbrnr commented 2 days ago

You're really avoiding the issue of FIFF being a proprietary format. Is this not an issue for everyone here?

drammock commented 2 days ago

You're really avoiding the issue of FIFF being a proprietary format. Is this not an issue for everyone here?

Is it in fact proprietary? That's not the same as "originally developed by/for a for-profit company". I think (though I may be wrong!) it's an open standard that isn't documented as clearly/fully as you'd like; if it were proprietary, how is it that we can read/write those files without relying on a closed-source binary or agreeing to some incompatible license? Please open a separate issue for improving documentation of the FIFF file format, and let's keep this thread focused on the impedances issue.

drammock commented 2 days ago

Since you linked to #5302 you probably saw this already, but in case not: here is the FIFF file format spec

cbrnr commented 2 days ago

Of course I saw the link to some random Dropbox file. This is not what I'd say an open file format documentation should be like (never mind that it's from 2011 and that it contains copyrighted material, so I can't be sure that the document is even meant to be public).

sappelhoff commented 2 days ago

We now co-maintain the spec officially at https://github.com/mne-tools/fiff-constants.

Thanks! I didn't know this.

Please open a separate issue for improving documentation of the FIFF file format, and let's keep this thread focused on the impedances issue.

+1 for three separate discussions / action items:

improving FIFF documentation, and potentially a page somewhere that transparently summarizes the motivation of mne to rely on FIFF (historical or otherwise)
discussion on where to put impedances (outsource to pybv/etc., new FIF constant, ...)
discussion on potential future changes to the main mne IO backend

mscheltienne commented 2 days ago

Again, FIFF is a proprietary format, and maintaining just some constants is not a file format documentation. I don't think that an open-source package should be based on a proprietary format.

Personaly, I don't see the issue even if the format was proprietary. If the format is well made and suits our need perfectly, why not use it? But again, as @drammock mentions, it's not even proprietary, it's an open standard, not that well documented, but open.

Why do you think switching to HDF5 to dump our current raw structure would be so much work? We don't change anything, just the file format.

I'm -1 for this. As @larsoner says, if we wanted to support a different saving/export format, we could do it in mne.Export, but HDF5 is too flexible for this. There is no gain in saving to a format that everyone else will be able to read but no one else will be able to parse without writing a good amount of code.

IMO, dumping our current raw structure in a different format serves no purpose. I am more bothered by the structure of our objects mne.Info, mne.io.Raw, ... deriving heavily from the FIFF format. As more and more formats were read into MNE-Python, we always found a way to fit whatever information we could parse from those files into our containers.. but sometimes you need to get creative or eventually amend the FIFF specs when you can't find a sensible way to make it fit. It adds a level of complexity and of 'mess' in the code that could be avoided with a better thinking of what our containers should look like and what they should accomodate. But that's a project for another day 😉

Lastly, we need to find a solution for the raw.impedances issue. I think we should not remove this functionality just because FIFF doesn't support it.

+1 to remove it from MNE for now, while preserving reading in the I/O packages with an example in the MNE documentation. The current implementation was made for the brainvision format without even thinking about possible other formats, which might offer multiple impedance recordings within a single file. So now if we want to add this second case, we have a disparity between what raw.impedances contains for BV and for these other formats.

Alternatively, we could think about how to support multiple impedance measurements / recording within the FIFF specification and our structure, but I am not convinced that this is that useful and requested by our users.

Overall:

improving FIFF documentation, and potentially a page somewhere that transparently summarizes the motivation of mne to rely on FIFF (historical or otherwise)

+1 especially on the 'improving FIFF doc' part

discussion on where to put impedances (outsource to pybv/etc., new FIF constant, ...)

+1 to move the impedance parsing to I/O packages with an example on our documentation

discussion on potential future changes to the main mne IO backend

-- no opinion.

mne-tools / mne-python

API: Remove raw.impedances #12864