mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.71k stars 1.32k forks source link

Improve epoching / robust_average #294

Closed larsoner closed 6 years ago

larsoner commented 11 years ago

I liked that mne_process_raw could save several averages to a single evoked FIF. It would be nice if we could do that with our datasets, too. I think this basically requires enhancing the evoked instances to deal with storing multiple datasets. This is also a natural extension now that epochs can hold multiple event types in an intuitive way. This is how I'm thinking it might work:

1) On creation of evoked from read_evoked, you can specify one or more datasets/setno's to load from the disk.

2) On creation of evoked from epochs, epochs.average() would maintain current behavior, that is, averaging over all events regardless of event number, for backward compatibility. Enhanced functionality would come from epochs.average('AudL'), or epochs.average(['AudL', 'AudR']), which would return an evoked with one or two datasets coming from averaging a subset of the epochs objects, respectively.

While I'm in there, I think there was a way to also access the saved evoked data based on it's name (probably from the comment field?) instead of setno. Since we now have dict() use in epochs and such, this should be fairly straightforward.

@agramfort, let me know what you think before I start to hack hack hack.

dengemann commented 11 years ago

On 19.12.2012, at 19:45, Eric89GXL notifications@github.com wrote:

I liked that mne_process_raw could save several averages to a single evoked FIF. It would be nice if we could do that with our datasets, too. I think this basically requires enhancing the evoked instances to deal with storing multiple datasets. This is also a natural extension now that epochs can hold multiple event types in an intuitive way. This is how I'm thinking it might work:

sounds nice indeed!!

1) On creation of evoked from read_evoked, you can specify one or more datasets/setno's to load from the disk.

like multiple raws from a list of fnames. 2) On creation of evoked from epochs, epochs.average() would maintain current behavior, that is, averaging over all events regardless of event number, for backward compatibility. Enhanced functionality would come from epochs.average('AudL'), or epochs.average(['AudL', 'AudR']), which would return an evoked with one or two datasets coming from averaging a subset of the epochs objects, respectively.

this is already supported right? we just chose the epochs['aud'].average() syntax. We should exploit this instead if supporting the other syntax as long as there is no strong reason to add it. Moreover, in this context we should think about the meaning of list arguments. Will they merge events or split / create seperate averages int the ensueing evoked object.

While I'm in there, I think there was a way to also access the saved evoked data based on it's name (probably from the comment field?) instead of setno.

It's there: comment Since we now have dict() use in epochs and such, this should be fairly straightforward.

it should. we should just keep it apart from event_id, bookkeeping wise. @agramfort, let me know what you think before I start to hack hack hack.

— Reply to this email directly or view it on GitHub.

dengemann commented 11 years ago

On 19.12.2012, at 19:45, Eric89GXL notifications@github.com wrote:

I liked that mne_process_raw could save several averages to a single evoked FIF. It would be nice if we could do that with our datasets, too. I think this basically requires enhancing the evoked instances to deal with storing multiple datasets. This is also a natural extension now that epochs can hold multiple event types in an intuitive way. This is how I'm thinking it might work:

1) On creation of evoked from read_evoked, you can specify one or more datasets/setno's to load from the disk.

2) On creation of evoked from epochs, epochs.average() would maintain current behavior, that is, averaging over all events regardless of event number, for backward compatibility. Enhanced functionality would come from epochs.average('AudL'), or epochs.average(['AudL', 'AudR']), which would return an evoked with one or two datasets coming from averaging a subset of the epochs objects, respectively.

While I'm in there, I think there was a way to also access the saved evoked data based on it's name (probably from the comment field?) instead of setno. Since we now have dict() use in epochs and such, this should be fairly straightforward.

I've got to add two more questions: What would you use it for? currently it operates as annotation on the object level. What would it make different from event id when going down to subsets of epochs?

@agramfort, let me know what you think before I start to hack hack hack.

— Reply to this email directly or view it on GitHub.

larsoner commented 11 years ago

I think all the merging, etc. should be separated from this functionality, thus the list represents lists of separate, well-defined things to average.

The problem with using epochs['AudL'].average() followed by epochs['AudR'].average, is it would require adding a merge_evoked() function of some sort, to make an evoked object with multiple datasets.

I'd like to use it because if I have an analysis with, say, 24 conditions that I want evoked data for, then having to have 24 different evoked instances to carry them around is frustrating. It's also annoying to have to write out those 24 things to disk if I want to visualize them with mne_analyze.

larsoner commented 11 years ago

I think it's okay to have epochs['AudL'].average() and epochs.average(['AudL']) accomplish the same thing (which they would, under my proposal), because they would likely be used in different contexts most of the time. epochs['AudL'] I see as being mostly useful for getting an epochs instance to manipulate, e.g. for doing connectivity analyses, as opposed to as a means of getting an evoked instance. epochs.average() I think of as being useful for getting an evoked instance, so extending the functionality there makes the most sense to me.

dengemann commented 11 years ago

On 19.12.2012, at 20:04, Eric89GXL notifications@github.com wrote:

I think all the merging, etc. should be separated from this functionality, thus the list represents lists of separate, well-defined things to average.

The problem with using epochs['AudL'].average() followed by epochs['AudR'].average, is it would require adding a merge_evoked() function of some sort, to make an epochs object with multiple datasets.

we could add support to write epochs[['aud', 'vis']].average() couldn't we? Would also be consistent with pandas style. my suggestion was to integrate what you propose with the current getitem method. I'd like to use it because if I have an analysis with, say, 24 conditions that I want evoked data for, then having to have 24 different evoked instances to carry them around is frustrating.

absolutely, i'm on your side. let's just implement it with care ;-) It's also annoying to have to write out those 24 things to disk if I want to visualize them with mne_analyze.

— Reply to this email directly or view it on GitHub.

larsoner commented 11 years ago

That might work.

So if one did epochs[['aud', 'vis']], presumably this would return epochs with two event types, aud and vis.

Then all we'd need to do is add a flag to average() to have it either average across all event types (current behavior) or make one evoked instance with multiple averages based on the contained events. Something like epochs.average(merge_events=True) by default, where =False would return the instance with multiple evoked datasets.

dengemann commented 11 years ago

that's what i had in mind when reading your proposal. If it feels right to you I would clearly prefer that over double getitem syntax / options. Also it would give rise to adding your merge event ids as instance method.

On 19.12.2012, at 20:21, Eric89GXL notifications@github.com wrote:

That might work.

So if one did epochs[['aud', 'vis']], presumably this would return epochs with two event types, aud and vis.

Then all we'd need to do is add a flag to average() to have it either average across all event types (current behavior) or make one epochs instance with multiple averages based on the contained events. Something like epochs.average(merge_events=True) by default, where =False would return the instance with multiple evoked datasets.

That I would like :-)

— Reply to this email directly or view it on GitHub.

larsoner commented 11 years ago

Sounds like that will work, then. Let's see what @agramfort thinks now...

dengemann commented 11 years ago

While we are at it. What would you guys think about epochs.apply(fun, merge=True) for custom epochs aggregation?

On 19.12.2012, at 20:21, Eric89GXL notifications@github.com wrote:

That might work.

So if one did epochs[['aud', 'vis']], presumably this would return epochs with two event types, aud and vis.

Then all we'd need to do is add a flag to average() to have it either average across all event types (current behavior) or make one epochs instance with multiple averages based on the contained events. Something like epochs.average(merge_events=True) by default, where =False would return the instance with multiple evoked datasets.

— Reply to this email directly or view it on GitHub.

agramfort commented 11 years ago

I read quickly but let me answer briefly to give me my opinion.

why not supporting write many evoked in one fif but I would not do it via the Evoked.save method but rather with something like

mne.fiff.write_evoked(fname, [evoked1, evoked2])

wrt the idea a read all the evoked from a file why allowing mne.fiff.read_evoked to return a list of evoked or a dict with the key if the condition.

wdyt? do this answer the use case?

agramfort commented 11 years ago

While we are at it. What would you guys think about epochs.apply(fun, merge=True) for custom epochs aggregation?

sorry if I did not follow but can you be more explicit?

dengemann commented 11 years ago

Sorry for my brief style.

Currently we allow users to generate epochs by calling .average / .std_error. To allow users custom aggregation functions I was thinking to add an instance method along the lines of raw.apply; pandas.DataFreme.apply / aggregate taking the same parameters as epoch.average / std_error + a custom function.

Makes sense?

D

On Wed, Dec 19, 2012 at 9:19 PM, Alexandre Gramfort < notifications@github.com> wrote:

While we are at it. What would you guys think about epochs.apply(fun, merge=True) for custom epochs aggregation?

sorry if I did not follow but can you be more explicit?

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11546602.

agramfort commented 11 years ago

can you write a snippet with a use case where it would make this simpler or my powerful?

dengemann commented 11 years ago

On Wed, Dec 19, 2012 at 8:58 PM, Alexandre Gramfort < notifications@github.com> wrote:

I read quickly but let me answer briefly to give me my opinion.

why not supporting write many evoked in one fif but I would not do it via the Evoked.save method but rather with something like

mne.fiff.write_evoked(fname, [evoked1, evoked2])

That's nice. But how do you create the epochs you want to save?

I think the idea of this porposal was to take advantage (and extend) the dict / multiple conditions infrastructure of our epochs, such that you can generate separate evoked instances from the different conditions specified in event_id.

calling[['aud', 'vis'] would give you as subset of the conditions requested and then calling .average / std_err / apply would either merge the epochs to one evoked of different conditions or create an evoked object including separate averages for each condition.

Makes sense? Would this also be possible with the solution you propose?

wrt the idea a read all the evoked from a file why allowing

mne.fiff.read_evoked to return a list of evoked or a dict with the key if the condition.

wdyt? do this answer the use case?

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11545608.

dengemann commented 11 years ago

events = mne.read_events(event_fname)[:10]

raw.info['bads'] = ['MEG 2443']
picks = mne.fiff.pick_types(raw.info, meg='grad', eeg=False, eog=True,
                            stim=False, exclude=raw.info['bads'])

tmin, tmax = -0.2, 0.5
baseline = (None, 0)
reject = dict(grad=4000e-13, eog=150e-6)

event_id = dict(auditory_l=1, auditory_r=2, visual_l=3, visual_r=4)

epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True,
picks=picks,
                    baseline=baseline, preload=True, reject=reject)

# here it comes! @agramfort;

evoked_median = epochs.apply(np.median)  # instead of mean median along the
epochs dimension

More clear now?

On Wed, Dec 19, 2012 at 9:47 PM, Alexandre Gramfort < notifications@github.com> wrote:

can you write a snippet with a use case where it would make this simpler or my powerful?

? Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11547687.

agramfort commented 11 years ago

np.mean, np.std, np.median are called reduction methods so why not but the semantic is pretty different as apply for Raw returns a Raw and operates in place.

for the rest I would just do:

mne.fiff.write_evoked(fname, [epochs['AudL'].average(), epochs['VisL'].average()])

larsoner commented 11 years ago

@agramfort that does cover the use case I need it for, so I'm okay with it. I'll work on implementing that, shouldn't be too bad.

I assume you're also okay with epochs[['AudL', 'AudR']] returning epochs with just the events for 'AudL' and 'AudR', since it's a natural extension of epochs['AudL'] functionality...? If so, I'll implement that, too, while I'm digging around in there.

Maybe @agramfort and @dengemann what would work for the other parts is basically a to_evoked() function, which could do "mean"/average, "std", "median", etc. but would basically operate along and collapse the first dimension of epochs to obtain an evoked using an arbitrary function.

dengemann commented 11 years ago

Right. So, semantically, the options would be epochs.aggregate / epochs.reduce


or

evokeds = [epochs[c]].average() c for c in epochs.event_id] mne.fiff.write_evoked(fname, evokeds)

On Wed, Dec 19, 2012 at 10:01 PM, Alexandre Gramfort < notifications@github.com> wrote:

np.mean, np.std, np.median are called reduction methods so why not but the semantic is pretty different as apply for Raw returns a Raw and operates in place.

for the rest I would just do:

mne.fiff.write_evoked(fname, [epochs['AudL'].average(), epochs['VisL'].average()])

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11548325.

agramfort commented 11 years ago

evokeds = [epochs[c]].average() c for c in epochs.event_id] mne.fiff.write_evoked(fname, evokeds)

yes simple as that.

larsoner commented 11 years ago

Hah! Someone already implemented multiple-evoked writing in .FIF in write_evoked, and multiple-evoked reading in read_evoked. I wonder if that was me... thought I was too young to have memory loss...

In any case, I can still change read_evoked() to allow for using strings instead of ints to index the evoked datasets, since I think that's safer and more intuitive than having to remember which setno was which. The user will probably also have to specify the type (stderr or average), since that's coded in the FIF in FIFF.FIFF_ASPECT_KIND.

Speaking of which, @dengemann @agramfort, if we do go to using arbitrary collapsing functions, we'll need to come up with a different FIFF.FIFF_ASPECT_KIND to describe it. Currently there is FIFF.FIFFV_ASPECT_AVERAGE and FIFF.FIFFV_ASPECT_STD_ERR, we could just make a catch-all FIFF.FIFFV_ASPECT_CUSTOM for any others a user wants to do.

agramfort commented 11 years ago

@agramfort that does cover the use case I need it for, so I'm okay with it. I'll work on implementing that, shouldn't be too bad.

I assume you're also okay with epochs[['AudL', 'AudR']] returning epochs with just the events for 'AudL' and 'AudR', since it's a natural extension of epochs['AudL'] functionality...? If so, I'll implement that, too, while I'm digging around in there.

yes I am ok with that.

Maybe @agramfort and @dengemann what would work for the other parts is basically a to_evoked() function, which could do "mean"/average, "std", "median", etc. but would basically operate along and collapse the first dimension of epochs to obtain an evoked using an arbitrary function.

why not although I am not fully convinced about the use case. A new reduction that could be relevant is the use of robust mean with outlier rejection. statsmodel has something for this.

agramfort commented 11 years ago

Hah! Someone already implemented multiple-evoked writing in .FIF in write_evoked, and multiple-evoked reading in read_evoked. I wonder if that was me... thought I was too young to have memory loss...

git blame is your friend in this case :)

In any case, I can still change read_evoked() to allow for using strings instead of ints to index the evoked datasets, since I think that's safer and more intuitive than having to remember which setno was which. The user will probably also have to specify the type (stderr or average), since that's coded in the FIF in FIFF.FIFF_ASPECT_KIND.

+1

larsoner commented 11 years ago

I thought you were making a joke. Apparently "git blame" is an actual thing.

Yes, it was me, but it was all the way back in early October. Clearly I can't be expected to remember that far back.

dengemann commented 11 years ago

Great we were able to clarify this, I was somewhat confused because I thought -- didn't we implement sth. like that, didn't I even read it in whats_new.rst?

... time for 0.5 if PRs start going circular.

On Wed, Dec 19, 2012 at 10:29 PM, Eric89GXL notifications@github.comwrote:

I thought you were making a joke. Apparently "git blame" is an actual thing.

Yes, it was me, but it was all the way back in early October. Clearly I can't be expected to remember that far back.

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11549447.

dengemann commented 11 years ago

Robust mean sounds pretty exciting. What implementation would you suggest, .average(method='robust'); .robust_average(); to_evoked(fun=robust_average)?

agramfort commented 11 years ago

maybe .robust_average() as you'll need to write a bit of code for it.

dengemann commented 11 years ago

you probably had in mind a procedure like this ":

http://scikit-learn.org/dev/auto_examples/applications/plot_outlier_detection_housing.html

On 20.12.2012, at 08:44, Alexandres Gramfort notifications@github.com wrote:

maybe .robust_average() as you'll need to write a bit of code for it. — Reply to this email directly or view it on GitHub.

agramfort commented 11 years ago

you probably had in mind a procedure like this ":

http://scikit-learn.org/dev/auto_examples/applications/plot_outlier_detection_housing.html

no. Something more along these lines:

http://statsmodels.sourceforge.net/stable/examples/generated/example_rlm.html http://statsmodels.sourceforge.net/stable/generated/statsmodels.robust.norms.LeastSquares.html

larsoner commented 11 years ago

@dengemann, if you're planning on pursuing the other type of evoked (robust_average or whatnot), feel free to re-title this issue. Otherwise, I'll go ahead and close it down.

dengemann commented 11 years ago

let me retitle it.

On Thu, Dec 20, 2012 at 8:50 PM, Eric89GXL notifications@github.com wrote:

@dengemann https://github.com/dengemann, if you're planning on pursuing the other type of evoked (robust_average or whatnot), feel free to re-title this issue. Otherwise, I'll go ahead and close it down.

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-11588048.

dengemann commented 11 years ago

@agramfort I'm back here, again thinking about this one. Looking at your statsmodels pointers I'm not sure how they inform this one (as those are basically regression examples). Did you have in mind something like a self.robust_XXX api where users could pass their own weight functions or e.g. the ones available in statsmodels: http://statsmodels.sourceforge.net/devel/rlm_techn1.html One then would apply mean / std reduction functions but apply weights to each of the instances as specified by a weight function targeting at each instances distribution features.

agramfort commented 11 years ago

I have clean code that uses a huber loss. I think we don't need to support all statsmodels options. I need permission to publish this code. What is unclear is what is nave in this case...

dengemann commented 11 years ago

On Fri, Mar 29, 2013 at 8:59 AM, Alexandre Gramfort < notifications@github.com> wrote:

I have clean code that uses a huber loss.

Excellent.

I think we don't need to support all statsmodels options.

I fully agree.

I need permission to publish this code.

Ok, let us know how it goes.

What is unclear is what is nave in this case...

Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-15631978 .

dengemann commented 11 years ago

On Fri, Mar 29, 2013 at 8:59 AM, Alexandre Gramfort < notifications@github.com> wrote:

I have clean code that uses a huber loss. I think we don't need to support all statsmodels options. I need permission to publish this code. What is unclear is what is nave in this case...

probably nave will need some extra handling then ... let's get back to it as we have code that does robust XXX.

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/294#issuecomment-15631978 .

larsoner commented 10 years ago

Some of these things we've implemented already. Of the ones we haven't we've lived without them for over a year. Okay to close this @dengemann? :)

jona-sassenhagen commented 6 years ago

Hmmm ... couldn't we at least have something like:

epochs.average(method="median")
epochs.average(method=np.median)
epochs.average(method=lambda d: (d * d).mean(0))

?

This would actually simplify a bunch of internal code too, because we're doing stuff like plotting GFPs quite a lot.

larsoner commented 6 years ago

Sounds like there is finally a volunteer to do this :)

jona-sassenhagen commented 6 years ago

Well, if Alex and you agree it's useful now. See rejection in #3044

larsoner commented 6 years ago

Closing as dup of #3044 :)

I can live with it now, especially if we can use it to simplify our internal functions.