Closed dengemann closed 9 years ago
Should be even more straight forward than I thought:
http://pandas.pydata.org/pandas-docs/dev/io.html#io-hdf5
Especially section on hierarchical keys.
Drafting ....
Here's the minimum required to save the main contents of an evoked file:
Example:
In [137]: store = to_hdf(evoked, 'sample', 'mne.h5')
In [138]: store
Out[138]:
<class 'pandas.io.pytables.HDFStore'>
File path: mne.h5
/sample/acq_pars frame (shape->[1,1])
/sample/acq_stim frame (shape->[1,1])
/sample/bads frame (shape->[1,1])
/sample/buffer_size_sec frame (shape->[1,1])
/sample/ch_names frame (shape->[1,1])
/sample/chs frame (shape->[59,13])
/sample/comps frame (shape->[1,1])
/sample/ctf_head_t frame (shape->[1,1])
/sample/data frame (shape->[106,59])
/sample/description frame (shape->[1,1])
/sample/dev_ctf_t frame (shape->[1,1])
/sample/dev_head_t frame (shape->[16,3])
/sample/dig frame (shape->[146,4])
/sample/experimenter frame (shape->[1,1])
/sample/file_id frame (shape->[2,4])
/sample/filename frame (shape->[1,1])
/sample/filenames frame (shape->[1,1])
/sample/highpass frame (shape->[1,1])
/sample/lowpass frame (shape->[1,1])
/sample/meas_date frame (shape->[2,1])
/sample/meas_id frame (shape->[2,4])
/sample/nchan frame (shape->[1,1])
/sample/orig_blocks frame (shape->[5,1])
/sample/orig_fid_str frame (shape->[1,1])
/sample/proj_id frame (shape->[1,1])
/sample/proj_name frame (shape->[1,1])
/sample/projs frame (shape->[4,4])
/sample/sfreq frame (shape->[1,1])
Nice :) but I somehow think if we want to support hdf5 we should write hdf5 directly (using PyTables) without relying on pandas. Also, what would be the advantages over fiff?
@mluessi two aspects:
https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py
already uses +4k lines of code to do a great job. I don't think we can do better in a short time with reasonable efforts.
The main asset here is that hdf5 is cross-language, cross-platform, cross-whatever and easier to customize. Writing custom structures is straight forward and does not involve making a PR and modfying readers / writers. Also it does not have any file size limitations. I would not see it as a replacement for fiff but as an additional tool for storing custom data, analyzes, and as an exchange format. I'm not informed about benchmarks at this point but I would not be surprised to see pytables + pandas outperforming naive fiff access.
Does that make sense?
@dengemann I see that the flexibility and efficency of hdf5 is a potential advantage. However, I think if we would officially support it, we would need to properly document what is stored and how, such that other packages (which may not be written Python) can rely on it. If we would rely on pandas to handle this, then we can't really guarantee consistency in the format I.e., if pandas decides to change the way things are stored in hdf5, then it would break the file format for all tools that don't use pandas to read the files..
Btw: Did anything come out of this workshop on electrophysiology file formats that you attended in December?
@mluessi I understand your concern. If we really want to add support for it we want to get it as right as we can. Probably the way outlined here is rather an extension of our Pandas io which is also interesting, especially given the few lines needed. True hdf5 support would mean that the files can be accessed e.g. from Matlab or any other tool. Yes, the meeting was nice. Let's discuss details during a hangout soon ;-) We should probably open an issue for a hangout and meet with everyone who's interested.
it's neat but I am not super convinced about the motivation. Matlab can read fif files thanks to mne-matlab. Now if you tell me that you read/write faster that files are smaller etc. than I think we should consider it. Reading pandas HDF5 files will require some work to be read from matlab don't you think?
Honestly my main motivation is to have a more flexible persistence system. One file to save all stcs or all evokeds in--- I don't want 20 times 8 fifs for one particular analysis. Adding an hdf5 io would be a nice addition for situations where flexibility counts. E.g. imagine you make a figure for a paper where you compare different preprocessing techiques over several runs and you want to share stuff with a colleagues, ideally via email. And so forth. Matlab is really a secondary aspect to me, actuall it's not important in this context, at least to me.
ok I like animated discussions on github :)
one we have stcs in fif files we could store stcs from many conditions in one file :)
would you store the forward + inverse + evoked + stc in one large file?
Yeah, that's the spirit. Maybe the one day I want to store everything, the other day only some of the things mentioned :-) Certainly I'd like to store data from nultiple subjects, e.g. after doing first level analyses. This proposal is not to replace fiffs but to support fast custom io beyond numpy arrays where you can basically save and reload mne objects as needed, e.g. save 5 ICA runs without creating a mess and without extending fiff support.
let's discuss this during the next hangout
Is there a way to read an h5 file into MNE python? Or to convert it to .FIF using python for subsequent analyses?
HDF5 is a pretty general format, so there isn't a clear-cut way to support reading arbitrary HDF5 files. h5py
, pandas
, and pytables
all facilitate reading HDF5-formatted files (depending on how they were created). If you can get your data into a matrix using these packages, then you can use RawArray
(or related classes) to operate on your data in mne-python
, including saving the file to FIFF format, sure. But you'd want to be careful about what the channel types etc. were being set to.
we've had an issue + demo code somewhere ... but indeed it's not easy to provide something general. RawArray and EpochsArray are your friends
Great, thanks. Where is the documentation for RawArray? The mne.io page for it isn't available.
You'll probably need to update to latest master, the location of RawArray
has shifted in the last couple weeks in the dev version.
So, I just updated to the latest version on git, but I'm not seeing the mne.io module:
AttributeError: 'module' object has no attribute 'io'
how did you install it after updating?
You should now find create_info in mne namespace
to read the doc:
mne.create_info?
I was able to find a cached version of it, but thanks!
I'm having a lot of issues trying to get my h5 file to look as functionally close to the .fif files as possible. All I want to do is simple epoch analyses. I managed to get my data and trigger info into two pandas dataframes, but I can't figure out how to organize these so that all the MNE scripts read it as per usual.
The data is 68 chs x number of time points. The trigger data contains a column with the trigger codes and a column with their timing.
Any advice on how to just get this into MNE? I saw you had a similar issue a couple years ago with another type of 4d file...
On Jun 19, 2014, at 7:09 AM, Amyunimus notifications@github.com wrote:
I was able to find a cached version of it, but thanks!
I'm having a lot of issues trying to get my h5 file to look as functionally close to the .fif files as possible. All I want to do is simple epoch analyses. I
it doe not have to be like fiff. that's why we created RawArray. The idea is that you pass data and channel names and corresponding channel types. Mne will do the rest. mocking a fiff is unrealistic. it will take forever ...
managed to get my data and trigger info into two pandas dataframes, but I can't figure out how to organize these so that all the MNE scripts read it as per usual.
can you share them? The data is 68 chs x number of time points. The trigger data contains a column with the trigger codes and a column with their timing.
Any advice on how to just get this into MNE? I saw you had a similar issue a couple years ago with another type of 4d file...
what kind of system are your data from?
— Reply to this email directly or view it on GitHub.
The RawArray read in the info and data fine, but that's it -- the usual men functions then did not plot the channels and did not let me sort them into epochs. I'm getting lots of deep errors, so I figured it was a formatting issue.
The data are from a Neuroscan system, originally in .cnt format. After some digging, I finally found a python script to get this into an open format, which gave me the .h5 file.
Here's the original .cnt file and the cnt2h5 script I used.
ok, you're not supposed to pass a dataframe. RawArray is designed for plain 2d channels w times numpy arrays. Good you bring up this case, we will make it throw errors in the future once somerhing else as numpy arrays is passed. I can take a look at the data / script this afternoon. Can you try once more with a numpy array? Also were you able to construct the info?
RawArray did read the pandas dataframe. I tried it now with numpy arrays, and it read the data. But I'm getting index out of bounds errors that I don't understand when I try to plot it or epoch it.
The only possible reason is that your info object does not match the data. The input to create_info should be a list of names of length n channels and the same for ch type.
e.g. ˋ`ˋ ch_names = ['EEG{}'.format(i+) for i in range(68)]
ch_types = ['eeg'] * 68
ˋˋˋ
the data input shoul be rows = channeld and columns = times. This is btw. not the default for dataframes whis is why it cannot end good if you just pass a df.
sorry the 1 is missing in the example above: i+1
we need an example
"Creating MNE data objects from arrays"
to point people to.
I am opening an issue
It just occurred to me that it should be possible to add a general bidirectional hd5 io without writing too many lines. We already have the pandas io and pandas supports pytables/hdf5. I was always thinking the info might be a problem but it's not:
`epochs.info['chs'] == pd.DataFrame(epochs.info['chs']).T.to_dict().values()``
-> True
This also works with 'projs', 'dig', 'comps', etc
So the only thing to really do would be to to call our pandas io for the data, do something like above for channels, projs, dig and comps the rest of non-nested flat values and we're good. On reading it should be easy to recreate e.g. projs objects from dictionaries. Internally all would go in an hd5 file with a key that includes a common prefix such that entries from all fields could be gathered on reading.
This would certainly be a useful io, allowing to basically store data from an entire analysis in one file ...