Include extra metadata with "info" objects

choldgraf commented 9 years ago

Hey all - I was trying to figure out if there's a way to store "extra" metadata with MNE objects, but it didn't seem like this is possible right now. For example, I tried adding some extra fields to the .info attribute dictionary-style, but these weren't written to disk when the object was saved.

In my case, I was doing this to try and store an extra "electrode information" object that had the x/y position for each electrode (for the purposes of plotting with the topographic layout from #1762). I know there's been some talk of adding ECoG functionality, and I think one challenge will be figuring out how to include the extra metadata that is often necessary for ecog (e.g., not only bad channels, but also bad timepoints, general location on the brain, whether an electrode was epileptic or not, etc).

Maybe the answer is just that people need to store this metadata separately and build objects around the MNE base classes, but I was curious what people thought about this before doing this myself.

larsoner commented 9 years ago

It wouldn't be a good idea to add new fields to the FIFF info struct because it will break compatibility with other tools that rely on the existing specification.

That being said, you do have some decent options.

Find the best place already in the info struct to store your data. For example, there is already an info['chs'][idx]['pos'] field that stores a position in 3-space in head coordinates. For the position use case you mention, you could store the coordinates and then map them down to 2D space for plotting in whatever way is most appropriate.
Store arbitrary extra information in one of the available existing string fields in the info dict. For example, the info['description'] field entry can accept any string (ASCII-encoded I think), which means that you could dump just about any data you really needed to in there. It's not elegant but it works. I already do something this for one of my datasets where I needed to reorder the channels because they were digitized out of order, so I store the reordering status and reordering index numbers in a string.
Roll your own arbitrary "extra information" format, as you describe, but use mne-python to do the annoying heavy lifting of data IO. We already have read_hdf5 and write_hdf5 functions that allow writing a pretty wide variety of python objects. Since HDF5 is a widely used standard, this should even be future compatible, and you could write your own MATLAB readers if you were so inclined. For example, you could do:
```
write_hdf5('data.hdf5', dict(brain_locs=['STS'] * 10 + ['PCG'] * 10, bad_times=np.array([0.1, 10.5]))
```
And then reading the data is as easy as data_dict = read_hdf5('data.hdf5').

Option 1) is generally better than options 2) or 3) when it's available. For the other specific cases you've brought up:

Bad timepoints -- this is really a pretty general epochs-info issue. The mne-python way of thinking about and storing timing-related data is through events files. I'd start there. You can store e.g. epileptic onset times as events, and then epoch using that. If you want to reject trials on that basis (instead of epoching based on the events), we should come up with an API for this use case. There are other use cases where e.g. a subject was observed to be moving where it would be nice to be able to incorporate manual "bad time" marking and include it in epoch rejection.
General location on the brain. You mean like chs 1-10 are over STS, 11-20 are over the central gyrus, etc.? I'd have to think harder about this. I'm not sure if there is a way of adding a comment to each channel, I'd have to dig into the code a bit... same with whether or not an electrode was epileptic. I'm inclined to think the *_hdf5 functions are best suited for this, since it seems pretty specific to ECoG. Alternatively, we could write a function that took a subject structural, channel position, and a head<->MRI trans and mapped the electrode position down onto the brain, and told you the label you were over...? This wouldn't help your "epileptic electrode" marking use case, though.

choldgraf commented 9 years ago

Wow - lots of great ideas in that reply, thanks very much :)

OK, regarding 1, that's a good idea, I hadn't pried into the info object too much and didn't find a ton of documentation on it. It looks like I was falsely assuming that it was just a glorified dictionary of arrays/lists. I'll look into this more.

Regarding electrode location stuff - I wonder if it would be possible to just create 2-D images from MRI data and 3-D electrode location data (aka, just project the electrodes onto the brain, then take a snapshot and return the 2-d locations of the electrodes). It might be easier than storing separate 2-D data and pictures. Regarding localizing electrodes to anatomic locations, the reason I want custom labels is because I almost never use standardized brains for my plots (I personally don't trust the ECoG electrode locations already, so I'm even less inclined to force them onto an MNI brain or something). This would make it quite difficult to automatically determine the anatomy of electrodes, no?

I like the ideas behind 2 and 3, though you're right, they both seem a little bit hacky. I can try serializing everything and putting it in the description field, though I'm worried that this will become a black hole for me (I just had a premonition of a future where I'm storing hundreds of sklearn objects as a pickle string in there >:) ). If enough people are using this, why not just create an extra field of the info structure, and require that whatever is inside be pickleable?

Roll your own arbitrary "extra information" format, as you describe, but use mne-python to do the annoying heavy lifting of data IO

I agree that the I/O heavy lifting is a PITA. Right now I have my own classes that export to HDF5 and store data parameters as attributes, but it would be much better to have a more general system that uses MNE under the hood. I will look into the read_hdf5 functions and see how they work. Thus far I've kept data formats in my own sub-optimal codebase setup, but perhaps it's time for a rewrite.

General location on the brain. You mean like chs 1-10 are over STS, 11-20 are over the central gyrus, etc.? I'd have to think harder about this.

What I'm doing right now is basically storing a CSV file with each dataset. Rows of the CSV are electrodes, and columns are electrode properties. Then, I store arbitrary information in this CSV, so things like bad electrodes, epileptic electrodes, xy positions, etc. It is useful for dealing with the quirky and unique nature of many ecog datasets, but can also be cumbersome if you want to standardize things across lots of datasets.

larsoner commented 9 years ago

I think a 3D-electrode data + MRI -> 2D image mapping function would be pretty cool / generally useful. +1 for (you) adding that :)

I'm -1 on adding an extra field to info, again for backward-compatibility reasons -- it's not really in the FIFF spec, and I wouldn't want to force other packages to have to know about it.

Pickling objects and expecting forward compatibility is a bit dangerous, too, so I'd try to avoid that if you can, too. Maybe you can get away with using the read_hdf5 and write_hdf5 functions for your use case. That way you get to specify exactly what goes in there that is appropriate for your use case, and you get compression, future compatibility, etc. for free. Standardizing across datasets then becomes an issue of defining which fields exactly you need, and populating them. Accessing the result is as easy as accessing a list, tuple, dict, numpy array, list of tuples of dicts of numpy arrays, etc.

choldgraf commented 9 years ago

Fair enough - I've run into forwards-compatibility with pickling things too (now I actually store model coefficients as dataframes so I can keep some metadata with the index / column values). I'm basically doing what you suggest, but using the pandas to_hdf function.

Regarding 2-d projections and such. The tricky thing about this is that the MR reconstructions are often of highly variable quality. Sometimes they're decent, sometimes they're horrible. An algorithm along those lines would be quite tricky, no?

On the other hand, if you've already got 3-D positions of the electrodes and you literally just want a function that says "orient the brain this way, then return a PNG along with the x/y position of all electrodes in it", then that might be more doable. Do you think this would be straightforward with whatever neuroimaging packages are out there? I've been in touch with the guy behind pycortex, as well as (hopefully) some of the nipy people, but haven't gotten an answer yet.

On Tue, Feb 17, 2015 at 4:13 PM, Eric Larson notifications@github.com wrote:

I think a 3D-electrode data + MRI -> 2D image mapping function would be pretty cool / generally useful. +1 for (you) adding that :)

I'm -1 on adding an extra field to info, again for backward-compatibility reasons -- it's not really in the FIFF spec, and I wouldn't want to force other packages to have to know about it.

Pickling objects and expecting forward compatibility is a bit dangerous, too, so I'd try to avoid that if you can, too. Maybe you can get away with using the read_hdf5 and write_hdf5 functions for your use case. That way you get to specify exactly what goes in there that is appropriate for your use case, and you get compression, future compatibility, etc. for free. Standardizing across datasets then becomes an issue of defining which fields exactly you need, and populating them. Accessing the result is as easy as accessing a list, tuple, dict, numpy array, list of tuples of dicts of numpy arrays, etc.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1804#issuecomment-74784553 .

larsoner commented 9 years ago

You could get PySurfer to give you that image pretty easily, but the 2D locations would be harder. I'm not sure what the best solution would be.

choldgraf commented 9 years ago

Yeah - our lab manager and I are gonna see what kind of functionality pysurfer has.

On Wed, Feb 18, 2015 at 11:47 AM, Eric Larson notifications@github.com wrote:

You could get PySurfer to give you that image pretty easily, but the 2D locations would be harder. I'm not sure what the best solution would be.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1804#issuecomment-74934200 .

choldgraf commented 9 years ago

I was looking into this further and noticed that nipy has a (relatively outdated) project called "pbrain". Have any of the MNE folks played around with it before? http://nipy.org/pbrain/Loc3D_README.htm

larsoner commented 9 years ago

Nope, never tried it. Reminds me a bit of Slicer, I wonder how it's different.

larsoner commented 9 years ago

@choldgraf I'm going to close this, feel free to re-open if you have new ideas for implementations at the mne-python end.

choldgraf commented 9 years ago

Fair enough - it doesn't seem like there's a clear path to improvement right now anyway. For now I can just store extra metadata through my own objects and if it seems like there's a clear way to integrate, I'll bring it up again.

On Sun, Apr 12, 2015 at 6:08 AM, Eric Larson notifications@github.com wrote:

@choldgraf https://github.com/choldgraf I'm going to close this, feel free to re-open if you have new ideas for implementations at the mne-python end.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1804#issuecomment-92057557 .

mne-tools / mne-python

Include extra metadata with "info" objects #1804