mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.67k stars 1.31k forks source link

ENH: single trial TFR API #2024

Closed dengemann closed 8 years ago

dengemann commented 9 years ago

Continues discussion from #1780.

choldgraf commented 9 years ago

Is this with the goal of being able to compute TFR and retain individual trial traces? Very much so +1 to this :) for ECoG anyway, it'd be nice to be able to just calculate a full TFR and save it do disk for multiple uses later.

agramfort commented 9 years ago

tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))] then you need a write tfr that can write multiple ones...

but maybe it's not so easy...

choldgraf commented 9 years ago

I agree that it's doable, though seems strange to get single trial tfr functionality by using the AverageTFR in a non-intuitive way. Either way, for me a question is how to handle the representation of the result. E.g., one way to save would be to combine the results into a big dataframe and save it to HDF5, but maybe there are better ways of handling the output of the data (it seems like using fif format wouldn't make sense anymore)

On Fri, May 8, 2015 at 1:32 AM, Alexandre Gramfort <notifications@github.com

wrote:

tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))] then you need a write tfr that can write multiple ones...

but maybe it's not so easy...

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-100152526 .

dengemann commented 9 years ago

I was thinking about something as boring as a designated function returning a designated object, some kind of EpochsTFR with .average method.

larsoner commented 9 years ago

Saving the data to HDF5 and loading it will be almost trivial once the object is defined. You just dump the object properties to a dict and save, and repopulate on load.

choldgraf commented 9 years ago

Out of curiosity, why wasn't this the original implementation instead of creating an AverageTFR class. It seems like it would have been more in line with MNE structures (e.g., create an Epochs object, and then give an average function to create an Evoked object). Are there issues with memory or something?

On Fri, May 8, 2015 at 9:09 AM, Eric Larson notifications@github.com wrote:

Saving the data to HDF5 and loading it will be almost trivial once the object is defined. You just dump the object properties to a dict and save, and repopulate on load.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-100282966 .

agramfort commented 9 years ago

yes it's a memory issue. In TF you blow up the dimension of the data

choldgraf commented 9 years ago

yeah that's the same problem I've ran into before. I guess the upside is that you can often decimate TFR representations though, no? (I've never used these methods for things like coherence or phase estimate though...maybe that'd mess up?)

On Fri, May 8, 2015 at 10:48 AM, Alexandre Gramfort < notifications@github.com> wrote:

yes it's a memory issue. In TF you blow up the dimension of the data

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-100310142 .

agramfort commented 9 years ago

decimation helps and most people need the average that's why we chose this design. But we can always improve it....

choldgraf commented 9 years ago

yeah that definitely makes sense. all the current behavior could remain the same if need be anyway, no? You could just use "average_epochs" as a flag to the various tfr_XXX functions and it could return an AverageTFR object in that case. Otherwise, it would return an EpochsTFR. Or maybe it would make more sense to call AverageTFR something like EvokedTFR to keep it in line with the current epochs naming.

On Fri, May 8, 2015 at 12:59 PM, Alexandre Gramfort < notifications@github.com> wrote:

decimation helps and most people need the average that's why we chose this design. But we can always improve it....

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-100341570 .

agramfort commented 9 years ago

an average param in the tfr_XXX function would do it indeed. EpochsTFR sounds good

dengemann commented 9 years ago

EpochsTFR

That would be important to do deconding and stats across trials.

choldgraf commented 9 years ago

Yeah, I first thought about this when I was looking into the time generalization stuff as well as from this example [1] where single_trial_power is used to get the full TFR for one electrode.

It seems that the code in single_trial_power is close to what would be needed anyway, just would need to let it use more than morlet wavelets, and to output the EpochsTFR class, no?

[1] http://martinos.org/mne/stable/auto_examples/stats/plot_cluster_1samp_test_time_frequency.html

dengemann commented 9 years ago

just would need to let it use more than morlet wavelets, and to output the EpochsTFR class, no?

yes, basically that's it.

choldgraf commented 9 years ago

Hey all - just a quick list of some important stuff to consider from a chat with me and @dengemann:

Important stuff to consider

Potential use cases

choldgraf commented 9 years ago

Random update - running CWT_wavelet on individual epochs and storing them as a complex128 array is taking up ~8GB of disk space for me. This is with roughly 200 trials, 60 channels, 50 freqs, and 600 timepoints. I just wrote a simple class to handle I/O for me right now, which basically just does this:

class EpochTFR(object):
    def __init__(self, data, freqs, events, times, info):
        self.data = data
        self.freqs = freqs
        self.events = events
        self.times = times
        self.info = info

    def write_hdf(self, fname):
        print('Writing TFR data to {0}'.format(fname))
        write_dict = {'data': self.data, 'freqs': self.freqs,
                      'events': self.events, 'times': self.times,
                      'info': self.info}
        mne._hdf5.write_hdf5(fname, write_dict, overwrite=True)

    @staticmethod
    def from_hdf(fname):
        out = mne._hdf5.read_hdf5(fname)
        etfr = self.__init__(out['data'], out['freqs'], out['events'],
                             out['times'], out['info'])
        return etfr

This is after downsampling the signal from 1000Hz to 100Hz (post-TFR decomp). I haven't really found any good information about how much it's "ok" to downsample after you've got the TFR though.

agramfort commented 9 years ago

looks useful to me

choldgraf commented 9 years ago

I should say, the morlet wavelet CWT finished in about 10 minutes, so that's not a trivial amount of time, though not a huge amount of time either.

Personally it would be useful to be able to just load full epoched TFR information into memory so I can do branching-off analyses from there. Certainly saves time as you have more and more datasets (especially since you currently can't use joblib for any of this due to memory constraints).

What other functionality should be in a class like this? I can imagine:

  1. A method to return the mean of a subset of frequencies as an AverageTFR object
  2. A method to return a subset of frequencies as an Epochs object (e.g., to do time generalization)
  3. An iterator method that would return electrodes if you iterate through this (so that you can do something like permutation test 2-d clustering
  4. Methods for selecting sub-parts of data (e.g., crop, pick_channels, etc)
  5. Methods for only reading in a subset of data (e.g., "from the hdf file, return only channels 2-4" or "from the hdf file, average frequency power between 4-8 and return as an epochs object")

I was thinking it would be useful to just have this as a way of representing the data, and then use derivation classes to do more computational stuff (e.g., expect the user to turn this into an Epochs object, then run analysis, rather than having analysis code operate directly on EpochTFR). What do people think?

choldgraf commented 9 years ago

Here's what I have now...worth opening a PR?

class EpochTFR(object):
    """
    Create an EpochTFR object for single-trial TFR representation.

    This allows you to store TFR information without averaging across
    trials. Be careful with memory usage, as it can get quite large.
    Consider decimating before this step.

    Parameters
    ----------
    data : ndarray, shape(n_epochs, n_chans, n_freqs, n_times)
        The TFR data you wish to construct.
    freqs : ndarray, shape(n_freqs)
        The frequency labels during TFR construction
    times : ndarray, shape(n_times)
        The time labels during epochs creation
    events : ndarray, shape (n_epochs, 3)
        An MNE events array, see documentation for details
    event_id : dict
        A mapping from event names to integer event IDs
    info : mne Info object
        The info object from the original Epochs instance.
        May be unnecessary but keep it just in case.
    method : string
        Information describing how this TFR data was created.

    Attributes
    ----------
    ch_names : list
        The names of the channels
    """
    def __init__(self, data, freqs, times, events, event_id, info, method=None):
        if data.ndim != 4:
            raise ValueError('data should be 4d. Got %d.' % data.ndim)
        n_epochs, n_channels, n_freqs, n_times = data.shape
        if n_channels != len(info['chs']):
            raise ValueError("Number of channels and data size don't match"
                             " (%d != %d)." % (n_channels, len(info['chs'])))
        if n_freqs != len(freqs):
            raise ValueError("Number of frequencies and data size don't match"
                             " (%d != %d)." % (n_freqs, len(freqs)))
        if n_times != len(times):
            raise ValueError("Number of times and data size don't match"
                             " (%d != %d)." % (n_times, len(times)))
        if n_epochs != events.shape[0]:
            raise ValueError("Number of epochs and data size don't match"
                             " (%d != %d)." % (n_epochs, events.shape[0]))
        self.data = data
        self.freqs = freqs
        self.events = events
        self.event_id = event_id
        self.times = times
        self.info = info
        self.method = method

    @property
    def ch_names(self):
        return self.info['ch_names']

    def write_hdf(self, fname, **kwargs):
        """
        Write the object to an HDF file.

        This bottles up all object attributes into a dictionary,
        and sends it to an HDF file via the h5py library.

        Parameters
        ----------
        fname : str
            The file name of the hdf5 file.
        **kwargs : dict
            Arguments are passed to write_hdf5
        """
        print('Writing TFR data to {0}'.format(fname))
        write_dict = {'data': self.data, 'freqs': self.freqs,
                      'events': self.events, 'times': self.times,
                      'info': self.info, 'event_id': self.event_id}
        mne._hdf5.write_hdf5(fname, write_dict, **kwargs)

    def subset_freq(self, fmin, fmax):
        """
        Return the mean of a subset of frequencies as an Epochs object.

        Parameters
        ----------
        fmin : float
            The minimum frequency to keep.

        fmax : float
            The maximum frequency to keep.
        """
        mask_freq = (self.freqs > fmin) * (self.freqs < fmax)
        data_freq = self.data[:, :, mask_freq, :].mean(2)
        epoch_info = self.info.copy()
        epoch_info['description'] = "{'kind': 'TFRFreqSubset', 'freqs': '{0}-{1}'.format(fmin, fmax)}"
        return mne.EpochsArray(data_freq, epoch_info, self.events,
                                 tmin=self.times.min(), event_id=self.event_id)

    def subset_epoch(self, epoch_ixs):
        """
        Return the mean of a subset of epochs.

        Parameters
        ----------
        epoch_ixs : list, ndarray
            The epoch indices to keep

        Returns
        ----------
        obj : AverageTFR
            Instance of AverageTFR with epochs averaged.
        """

        data_epoch = self.data[epoch_ixs, ...].mean(0)
        epoch_info = self.info.copy()
        return mne.time_frequency.AverageTFR(epoch_info, data_epoch, self.times,
                                             self.freqs, len(epoch_ixs), comment='TFREpochSubset',
                                             method=self.method)

    def crop(self, tmin=None, tmax=None, copy=False):
        """Crop data to a given time interval

        Parameters
        ----------
        tmin : float | None
            Start time of selection in seconds.
        tmax : float | None
            End time of selection in seconds.
        copy : bool
            If False epochs is cropped in place.
        """
        inst = self if not copy else self.copy()
        mask_time = mne.utils._time_mask(self.times, tmin, tmax)
        inst.times = inst.times[mask_time]
        inst.data = inst.data[..., mask_time]
        return inst

    def __repr__(self):
        s = "time : [%f, %f]" % (self.times[0], self.times[-1])
        s += ', epochs: {0}'.format(self.events.shape[0])
        s += ", freq : [%f, %f]" % (self.freqs[0], self.freqs[-1])
        s += ', channels : %d' % self.data.shape[0]
        return "<EpochTFR  |  %s>" % s

    @staticmethod
    def from_hdf(fname):
        """
        Read an EpochsTFR from an HDF5 file.

        This expects an HDF5 file that was created with
        write_hdf5. It auto-populates a new EpochsTFR object.
        Be careful with memory consumption!

        Parameters
        ----------
        fname : str
            The path to the HDF5 file you wish to import.

        Returns
        -------
        etfr : EpochsTFR
            The EpochsTFR object contained in the HDF5 file.
        """
        params = mne._hdf5.read_hdf5(fname)
        etfr = EpochTFR(**params)
        return etfr
agramfort commented 9 years ago

what is needed is a way to generate such objects. The API of this object is next.

choldgraf commented 9 years ago

Cool - should this kind of code be roped into a PR that includes functions that return single-trial TFR? Seems like it could get confusing to have them separate. I don't have time to work on changing the TFR functions in earnest right now...maybe something I can work on while in France?

Otherwise, somebody else is more than welcome to take the code above and roll it into their own PR. I'm just going to use this locally for data I/O and creating TFR by iterating through epochs[i] with cwt_morlet, which is kinda hacky but still works.

agramfort commented 9 years ago

let's keep this for the sprint in Paris this summer

choldgraf commented 9 years ago

That still happening? Last I heard you guys were waiting on funding. Either way I've booked my flights: into paris on the 6th, out on the 28th.

On Wed, Jun 3, 2015 at 12:55 PM, Alexandre Gramfort < notifications@github.com> wrote:

let's keep this for the sprint in Paris this summer

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-108596062 .

agramfort commented 9 years ago

we have the funding now :)

choldgraf commented 9 years ago

Wohoo! :moneybag:

matt-erhart commented 9 years ago

Is it possible to get single trial phase as well? Does something like 'tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))]' exists for phase that I could use in the sort term? Could this kind of feature be added to the single trial TFR code if nothing exists?

choldgraf commented 9 years ago

Right now the way I'm using it on my own is to store the values as complex 64 bit numbers, that way you can calculate either the phase or amplitude. That said, right now there isn't any kind of tangible PR so that's just my own anecdotal use :)

On Thu, Jul 23, 2015 at 11:28 PM, Matt Erhart notifications@github.com wrote:

Is it possible to get single trial phase as well? Does something like 'tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))]' exists for phase that I could use in the sort term? Could this kind of feature be added to the single trial TFR code if nothing exists?

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-124244819 .

matt-erhart commented 9 years ago

Are you going into the various TF functions and changing a line like 'TFR_abs = np.abs(TFR)' so you can do abs or angle later? I could manage that for now if that's all. 'Complex', 'Angle', or 'Phase' might make for a useful flag. Might be relevant to the CFC branch as well.

choldgraf commented 9 years ago

Right now I'm just pulling stuff from the data attribute, which is a matrix of shape(n_epochs, n_channels, n_freqs, n_times). I'll just do something like:

amp = np.abs(tfr.data[:, 10, ...])

or

phase = np.angle(tfr.data[:, 10, ...])

That said, I think that if this were implemented it'd warrant some discussion about how to handle memory usage etc.

On Fri, Jul 24, 2015 at 1:04 AM, Matt Erhart notifications@github.com wrote:

Are you going into the various TF functions and changing a line like 'TFR_abs = np.abs(TFR)' so you can do abs or angle later? I could manage that for now if that's all. 'Complex', 'Angle', or 'Phase' might make for a useful flag. Might be relevant to the CFC branch as well.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-124262070 .

matt-erhart commented 9 years ago

Thanks for your help. The tfr you're referring to comes from your EpochTFR class above and not something currently in mnepy?

choldgraf commented 9 years ago

Yup, it's just my own code for personal uses right now. You can find it all here...it's relatively straightforward but has a lot of improvements to be made if it were to be an official addition to MNE.

https://github.com/choldgraf/ecogtools/blob/master/tfr.py

choldgraf commented 8 years ago

Hey all - reviving this because I'm working through my own EpochTFR code again and thought it might be worth revisiting.

Would this still be useful for MNE? Right now, I have a function that extracts the TFR of each trial and puts it into an EpochTFR object that keeps it all together and reads/writes from HDF5.

def tfr_epochs(epoch, freqs_tfr, n_decim=10, n_cycles=4):
    """Extract a TFR representation from epochs w/o averaging.

    This will extract the TFR for the given frequencies, returning
    an EpochsTFR object with data shape (n_epochs, n_chans, n_freqs, n _times).
    TFR is performed with a morlet wavelet.

    Parameters
    ----------
    epoch : mne.Epochs instance
        The epochs for which to extract the TFR
    freqs_tfr : array-like
        The frequencies to extract w/ wavelets
    n_decim : int
        The factor to decimate the time-dimension. 1=no decimation
    n_cycles : int
        The number of cycles for the wavelets
    """
    # Preallocate space
    new_times = epoch.times[::n_decim]
    new_info = epoch.info.copy()
    new_info['sfreq'] /= n_decim
    new_epoch = epoch.copy()
    n_epoch = len(new_epoch)
    out_shape = (n_epoch, len(epoch.ch_names),
                 freqs_tfr.shape[0], len(new_times))
    tfr = np.zeros(out_shape, dtype=np.float32)
    print('Preallocating space\nshape: {0}\nmbytes:{1}'.format(
        out_shape, tfr.nbytes/1000000))

    # Do TFR decomp
    print('Extracting TFR information...')
    pbar = ProgressBar(n_epoch)
    for i in range(n_epoch):
        iepoch = new_epoch[i]
        idata = iepoch._data.squeeze()
        itfr = mne.time_frequency.cwt_morlet(idata, iepoch.info['sfreq'],
                                             freqs_tfr, n_cycles=n_cycles)
        itfr = itfr[:, :, ::n_decim]
        tfr[i] = np.abs(itfr)
        pbar.update_with_increment_value(1)

    etfr = EpochTFR(tfr, freqs_tfr, new_times, new_epoch.events,
                    new_epoch.event_id, new_info)
    return etfr

I'm currently throwing away phase information just because it takes up a lot of RAM / disk space, and IO was becoming a problem on our cluster. It'd be easy to add back in though.

choldgraf commented 8 years ago

Just pinging this again to see if there's still interest. I was looking at the time_frequency code and the three functions (stockwell, morlet, and multitaper) seem to be doing pretty much the same thing. Though for some reason stockwell does it differently. The other two call a function _time_frequency, but stockwell seems to be using its own version of this (maybe for good reason?).

If all these functions used the same underlying code, I don't know that it'd be too tough to modify things to create functions that return 4D arrays. Basically just modify _time_frequency to output either averages over epochs or the full 4D array. Something like:

def _time_frequency(X, Ws, use_fft, decim, average=True):
    """Aux of time_frequency for parallel computing over channels
    """
    n_epochs, n_times = X.shape
    n_times = n_times // decim + bool(n_times % decim)
    n_frequencies = len(Ws)

    mode = 'same'
    if use_fft:
        tfrs = _cwt_fft(X, Ws, mode)
    else:
        tfrs = _cwt_convolve(X, Ws, mode)
    tfrs = tfrs[..., ::decim]
    if average is True:
        psd = np.zeros((n_frequencies, n_times))  # PSD
        plf = np.zeros((n_frequencies, n_times), np.complex)  # phase lock

        for tfr in tfrs:
            tfr_abs = np.abs(tfr)
            psd += tfr_abs ** 2
            plf += tfr / tfr_abs
        psd /= n_epochs
        plf = np.abs(plf) / n_epochs
    else:
        tfrs = tfrs[..., ::decim]
        tfr = np.abs(tfrs) ** 2
        plf = tfrs / np.abs(tfrs)
    return psd, plf

Then, you could write some functions like _tfr_power_morlet, _tfr_power_multitaper etc, that would just compute the windows and pass the data to _time_frequency. It wouldn't be so different from what single_trial_power does, but it seems like it would set the stage for more extendable/modular functions, and would make it easier to build on top of it if a full EpochsTFR object were created. Anyone have thoughts on that?

jasmainak commented 8 years ago

@choldgraf just to summarize the discussion in two lines, you want to have an EpochsTFR class which gives you AverageTFR when you do tfr.average() ? And you want some way to unify single_trial_power with tfr_xxx function?

I think I will veto any progress here until there is better documentation for time_frequency ;) We need something that explains what we have currently. Also, common idioms such as tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))] should go in there. It will give everyone a clearer picture of what we need.

choldgraf commented 8 years ago

that's fine with me, I need to spend time on other things than MNE developemnt :)

I totally agree with documentation, I think it's pretty confusing. What did you have in mind? Just docstrings, or examples? And just using the tfr_ functions or also things like cwt_morlet?

On Thu, Dec 31, 2015 at 3:58 AM, Mainak Jas notifications@github.com wrote:

@choldgraf https://github.com/choldgraf just to summarize the discussion in two lines, you want to have an EpochsTFR class which gives you AverageTFR when you do tfr.average() ? And you want some way to unify single_trial_power with tfr_xxx function?

I think I will veto any progress here until there is better documentation for time_frequency ;) We need something that explains what we have currently. Also, common idioms such as tfrs = [tfr_XXX(epochs[i]) for i in range(len(epochs))] should go in there. It will give everyone a clearer picture of what we need.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-168159912 .

jasmainak commented 8 years ago

I think a manual page. That's what I had in mind. Take a look at this for example: http://scikit-learn.org/stable/modules/linear_model.html. We need more like these. Something that gives the user a little background on the method (with a few equations if applicable), some nice plots, code snippets on how to generate them in python along with warnings / notes + references. Basically, a complete overview. I think you have a pretty good understanding of the time-frequency stuff. So, it would be great if you can contribute.

choldgraf commented 8 years ago

Cool - thanks for the reference links. I'll put this on my to-do list though it'll be a little bit because that PSD PR kind of sucked away all of my non-thesis working time for a little while...

On Fri, Jan 1, 2016 at 2:52 AM, Mainak Jas notifications@github.com wrote:

I think a manual page. That's what I had in mind. Take a look at this for example: http://scikit-learn.org/stable/modules/linear_model.html. We need more like these. Something that gives the user a little background on the method (with a few equations if applicable), some nice plots, code snippets on how to generate them in python along with warnings / notes + references. Basically, a complete overview. I think you have a pretty good understanding of the time-frequency stuff. So, it would be great if you can contribute.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-168294341 .

palday commented 8 years ago

Extending the @agramfort's efforts in #3476 : would it make sense to extend to slightly modify EpochsTFR to be more similar to Epochs?

  1. include the events infrastructure
    • include an events field
    • include the by-condition / events_id indexing mechanism
  2. add a get_data() method and ditch the data field?

For (2): This would make EpochsTFR differ from AverageTFR, but Epochs differs from Evoked in exactly the same way, so I'm not sure if that's really a problem.

kingjr commented 8 years ago

@palday +1 for both 1) and 2). Can you open a separate issue/PR?

@dengemann can you close this issue? I think we solved it with #3454 and #3476?

palday commented 8 years ago

Done.

choldgraf commented 8 years ago

+1 for both as well - and isn't there a PR w a more general get_data

function that could operate on lots of different classes?

On Thu, Sep 1, 2016 at 7:59 AM Phillip Alday notifications@github.com wrote:

Done.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/2024#issuecomment-244106623, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwSHVkyl2mjSTo2C-xdFvSV33mZZPjQks5qluhlgaJpZM4EJHBu .

mmagnuski commented 8 years ago

+1 too the generic get_data is _get_inst_data in mne.utils - but it is not public.

larsoner commented 8 years ago

Looks like remaining issues have been moved to a new issue, closing