Is there a "why we use FIF" document somewhere?

choldgraf commented 6 years ago

Hey all - in the BIDS-iEEG world we've had a debate going on for a while now about what formats to "officially" support. Basically it tends to boil down to:

Formats must be open and well-documented
Formats must be standardized and unambiguous when it's "raw" data vs. derivatives of raw data (e.g. epochs)
Formats should be well-used already

It seems we are converging on EDF and Brainvision (VHDR) as formats for BIDS-iEEG. There was pushback on using FIF because it's too generic to know exactly what is inside. However, recent conversations with @jasmainak make me think this might be the case for EDF as well?

Anyway, I'm looking for some guidance on this topic. Has anyone in MNE ever written a "why we use FIF" kind of document, that justifies its pros / cons and why it's fit for use as a first-class citizen in this open source project? I think such a document would be helpful in guiding the choice of formats in BIDS-iEEG (and likely other BIDS specs as well...I think BIDS-EEG will not support FIF either currently)

cc @agramfort or @larsoner in case they've thought about this

jasmainak commented 6 years ago

Perhaps @teonbrooks can also weigh in for EDF ...

larsoner commented 6 years ago

There was pushback on using FIF because it's to[o] generic to know exactly what is inside

mne show_fiff will tell you, and @jasmainak IIRC has an issue open about a what_fiff(fname) sort of function that would tell you what type of data it is (raw, epochs, evoked, source space, BEM, etc.).

AFAIK there is no "why FIF" document. But in brief, it contains all the fields we need for M/EEG analysis.

teonbrooks commented 6 years ago

EDF is a bit of a pain in the rear. That being said, it is a very common format and a lot of systems are compatible with it. There is a clear specification for how the data file can be written but in my experience, I feel that the spec should have been a bit more opinionated and rigid. There are so many exceptional cases for EDF that do fall within the scope of the spec but a package developer just might not consider given the spec alone, e.g. different sampling rates across channel types, stopped recordings with continuous files, underspecified headers, etc. One major problem with it is that it only allows for 16bit data resolution, which can be quite limiting for the expressing the dynamic range of the signal.

This is what led the birth of the BDF, a derivative of EDF with support for 24bit, which is primarily supported by Biosemi. 24bit is great in terms of signal resolution but it is also a pain given that most numerical packages expect data to be 16bit or 32bit.

FIF is a format that was borne out of the Neuromag system and it is well-structured to be maximally compatible with all the different physiological signals. The format architecture has been primarily driven by Matti H. and Elekta IIRC. It serves as a great format but it has primarily targeted to MEG data so it doesn't have as broad of a user base as the EDF format. It also has a historic max size of 2GB but that can be compensated for in linking files.

VHDR is the BrainVision system and it works as a two-part file, one that is a human readable header and the second as a binary data file. It is Brain Product format but it is not a generic or system-agnostic format like EDF, but that's not to say it can't be used as one.

The EDF would not be an ideal solution for MEG data given the large amount of sensor information that would not be captured in the EDF header. This also goes for VHDR.

IMO, I feel that the FIF format is the closest thing to how the NIFTI format is used in MRI. It has great strengths but it is only native to MEG. I have been reading through the EEG proposal for BIDS and there seems to be a stronger recommendation to converge on a primary default format, which is not the way the MEG spec went. I feel that EDF is maximally supported but I fear there would be some loss of data precision and potential loss of metadata (which could be solved with verbose json sidecar files).

jasmainak commented 6 years ago

I'm wondering why there is no documentation of the fiff specification? It's easy to find one for the EDF but not for the fiff ... (except perhaps some not very verbose info on the MNE website)

larsoner commented 6 years ago

Somewhere the FIF constants are documented, I have talked to the Elekta folks about allowing us to host the constant defs on GitHub but haven't gotten too much traction

larsoner commented 6 years ago

So far we just have the "MNE" range up

https://github.com/mne-tools/mne-fiff-constants

larsoner commented 6 years ago

Actually in a previous email with Elekta folks (Matti Kajola) said GitHub is probably okay. So I think if someone has time to migrate the official Elekta constants over to mne-fiff-constants, we can probably rename mne-fiff-constants to fiff-constants and have that be the public record. We would need to give the Elekta folks admin control over it.

larsoner commented 6 years ago

Current docs live here:

http://www.aston.ac.uk/lhs/research/centres-facilities/brain-centre/facilities-clinical-services/meg-studies/downloads/

I'll email the Elekta folks about an updated version. If they have one, I can add them to mne-fiff-constants and rename it to fiff-constants.

choldgraf commented 6 years ago

Maybe the better way to ask the question is "why use FIF instead of using another open format such as EDF"? I'm trying to make sure that the BIDS format choices are principled ones, and that we don't accidentally make an incorrect choice

choldgraf commented 6 years ago

also re: the FIF vs. EDF vs. VHDR approach, the main reasoning within BIDS-iEEG was that EDF would be ideal, except for the 16 bit limit. VHDR isn't as ideal in terms of structure, but allows for 32 bit data. So, the primary two formats to support would be EDF and VHDR.

larsoner commented 6 years ago

As @teonbrooks said, I doubt that EDF and VHDR provide the proper fields for MEG data (e.g., device-head transform, coil type, etc.). There are workarounds but they are less than ideal.

I don't really like the FIF 2GB limitation, it is annoying to have to work around so I wish there were a better, similarly complete alternative.

jasmainak commented 6 years ago

@choldgraf check out this document containing the specification. I just unzipped one of the urls above and put it on dropbox for you. I wonder why this is not available more prominently. I would suggest at least skimming over this document ...

jasmainak commented 6 years ago

I think one point that I raised with @choldgraf was that EDF appears to be less consistent and @teonbrooks has for instance been struggling to write a reader which works for all files. I wonder if this is an inherent limitation of the file format or if this is just because it is used widely.

agramfort commented 6 years ago

yes the edf format is extremely not-well specified. People store whatever they want in .edf files which keeps breaking readers all the time.

FIF is much well specified but it has too issues:

files need to be less that 2GB
it's really seen as an MEG format
it's not so easy to write properly (you need to understand fif tags, tree etc.)

cbrnr commented 6 years ago

Did you consider GDF? It is basically EDF with most restrictions removed. I think it has even been standardized (at least by some national standards committee).

jona-sassenhagen commented 6 years ago

I assume lifting the 2GB restriction on fif is out of the question?

EDF with most restrictions removed

That makes it worse, doesn't it?

cbrnr commented 6 years ago

I don't think so. It's not that GDF is less strictly specified. What I meant is that it has more supported data types (integers with 8/16/24/32/64 bits, floats with 32/64/128 bits), an event table, support for MEG orientations, and so on. I'm not saying that it is the ideal data format, but maybe it is worth considering GDF over EDF as it is a natural extension.

https://arxiv.org/abs/cs/0608052

palday commented 6 years ago

The VHDR metadata portion of the BV format is actually even simpler and yet more flexible than people realize ... it's essentially an INI file (which can be parsed with the Python standard library), with a few standardized sections and something between Markdown and YAML in the [Comment] section, where you could actually store MEG sensor metadata. That's not how we parse it in MNE for historical reasons (i.e. don't rewrite a working, tested codebase), but that's one of the things I'm working on in my standalone writer (and soon reader) ...

Speaking of historical reasons, that's always been my assumption about the role of FIFF in MNE. The original MNE tools were written for primarily for MEG and not EEG by someone working on an Elekta system.

There are other issues with FIFF for the EEG world:

it really is not as widely supported in existing EEG toolkits as [EB]DF and BV or even NeuroScan formats.
it also doesn't allow capture EEG-specific metadata easily such as
- different filters or sampling rates for different channels (which forced us to decide what to do about heterogeneous filters
- different references for different channels (you can work around this a bit by treating every channel as bipolar with its reference)
- by-channel impedance measurements

It also only allows for numeric triggers/events, which I have mixed feelings about.

agramfort commented 6 years ago

closing discussion.

mne-tools / mne-python

Is there a "why we use FIF" document somewhere? #5302