Closed choldgraf closed 8 years ago
personally I am not sure how common this use case is
You may be right. I've just found that experiments that use natural stimuli often have several different parameter or feature values for any given timepoint, and aren't always ideal for a traditional "time-locked trial onset" style of analysis. Maybe it's uncommon enough to not be worth implementing.
On Tue, Apr 14, 2015 at 1:35 PM, Alexandre Gramfort < notifications@github.com> wrote:
personally I am not sure how common this use case is
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-93051430 .
For fully exploiting rERP analysis, EEGLAB has begun implementing a more detailed event code system with hierarchical event codes. I'm also working on complex stimuli, e.g. natural language, and doing that with current MNE triggers would be a bit of a bitch.
Can't speak to how common it is, but for rERP and/or natural stimuli, it'd definitely be important.
Interesting. I think there is room for improvement. To be discussed carefully.
can you be more explicit about EEGLAB approach? maybe give possible usage snippets?
For example, imagine we want to analyze word processing via multiple multiply nested predictors of various types. We could estimate a regression model with factors such as word length in letters, word length in syllables, word frequency according to subcorpus 1 of corpus 1, word frequency according to subgroups 2 of corpus 1, word freq according to corpus 2, word class (categorical). It'd be very convenient if each event code could carry the relevant information, e.g.
events[103827] = {
'length': {'syllables':2, 'characters':7 },
'frequency': {'google': {'2009': 42, '2013': 43}, 'celex': 13},
'category': {'content_word': True, 'type': 'adverb'},
'presentation_duration': 254,
'response': {'rt_(msec)': 340, 'correct': True}
}
Or something like that. Maybe this only becomes useful as a Pandas data frame.
Ideally, it should be easily parseable for transformation via patsy formulas into design matrices for GLM estimation for rERP, and also for decoding/gat cross-condition generalisation/scikit-learn. The idea is that it should be easily doable to, say, estimate a regression model for celex word frequency + length + response (correct), and then another one for google-2009 frequency + google_2013 frequency, length in syllables, length in characters ... And then the same one, but with all interactions up to level 2.
Or we wish to select all words of the category adverb with correct responses, and compare them to all words of category noun with incorrect responses (for some reason).
I've used this with some idiosyncratic self-written EEGLAB routines and consider it a very powerful analysis scheme for any forms of naturalistic or rich, multi-predictor paradigms.
All of this would be very inconvenient with classical one-integer-only triggers. Yes, you could do it, e.g. by doing a mock integer that's actually to be read as a string coding a single design matrix - or via an external dictionary you look up in. But it's not convenient.
The EEGLAB approach looks like this:
Stimulus/Feedback,
Stimulus/Visual/Color/Red,
Stimulus/Visual/Shape/Ellipse/Circle/Height/2-deg,
Stimulus/Visual/Shape/Ellipse/Circle/Width/2-deg,
Stimulus/Visual/Background/Uniform Color/Black
With this, you could select all circles and contrast them to all squares, or select all red circles and contrast them to all other circles, then contrast them to all red squares, and so on.
They have some very ambitious goals with this, such as machine readable search engine friendly data bases of EEG experiments. See Link
I'm not saying I'm volunteering :)
This functionality already exists in mne-python. As part of a feature request @haribharadwaj made, and I think @teonlamont implemented see: #1583. It may be useful to have a more intuitive wrapper to work with (for cases where people haven't properly setup their trigger codes to use the feature in their STI channel); however, I use this in all of my research designs (it has always been a part of MNE).
HTH D
see also #1562, where I described a simple example of doing this, and there was a lot of discussion about possible APIs. Personally, I think an API, where you feed it a dict with the values being tuples of event codes and then it synthesizes a new trigger channel with the binary trigger feature, and runs standard mne.find_events, with trigger masks, would be the best option (not volunteering ;) ).
@dgwakeman The bit mask approach, as @kingjr mentions, does not easily generalize to noncategorical/continuous predictors, with float values, correct?
I think the challenge with non-categorical predictors is that the concept of an "event" generally entails a starting point and an event length. With something like regression there isn't any kind of starting point necessarily...you've just got varying values of some predictor variable. This would also be very useful for me since most of my work is with encoding / decoding models, but I'm not sure how it could fit into the current events / epochs code of MNE...
As an example: I might be interested in predicting electrode activity with linguistic features, so I play 100 sentences to a person over the course of 20 minutes. Each timepoint could now be parameterized with several features, such as spectrotemporal features, phonemes, syllables, word probability, etc. To that extent, it makes sense to store information about which features belong to each timepoint. It's hard to do this just using "events" because it's unclear what an "event" is (e.g., is it a sentence onset? syllable onset?). That said, as others have noted, it seems like shoving this functionality into events / epochs may not be the best idea since that's not really what the code was designed for, no?
Why do you need this in the epochs object? You can have all this info next to it.
I guess it depends on how much precision the float values have, but I could certainly see it becoming cumbersome. In the case of extremely complex float regressors. I would set a MISC channel to be the value in Volts of the float. And perhaps incorporate the option to input MISC channels for the regression analyses.
The way that I've gotten around this is basically to just use my own code for regressions and such, and using MNE for the basic data representation and such. I have a dataframe with "start" and "stop" columns (that I described at the beginning of this issue), and a function that will take such a dataframe and generate a big matrix of stim channels. Then I can just keep this matrix (either as an MNE object of "misc" channels or just as a separate dataframe) along side of my Raw object of ecog data, and pull out the channels I want when running regressions.
@dengemann my problem with Epochs objects isn't the fact that I can't keep metadata with them, it's that my epochs are variable length. E.g., if each "trial" is a sentence, then each one has a length somewhere between 2-4 seconds. I can always define an arbitrary cutoff if I only care about the evoked response to sentence onset, but it's a bit clunky.
@choldgraf I'm not sure why you need to epoch in the case you describe? Following @dgwakeman, can't you work directly on the continuous raw data using a misc for each continuous regressor you create?
Yup, that's usually what I do and it works pretty well for me already. If I also want to look at the event-related evoked activity then I'll do epoching and just define a single trial length for everything. I do think it'd be useful to have some functionality to generate arbitrary stim channels from a collection of time metadata, but perhaps it's outside of the scope of this.
Actually now that I think about it, one way in which epochs would be useful is in the size of data storage. If you've got a really long session then it can be helpful to remove all the "non-event" chunks first and then save the data, but this isn't such a big deal for most data sets :)
That s a bit of an inelegant trick but if your data is full of non-events, you could perhaps gain memory by setting the corresponding raw.data to 0 and transform it into a sparse matrix.
That's how I do it in EEGLAB - well, the other way around, basically
data = data[predictor != 0,:]
I would actually use nan
instead of 0 personally (it is possible to get a bunch of zeros in data, but probably not nan
), and then save as .fif.gz
. I would expect it to be less memory than even a sparse matrix.
My codebase has a long and storied history of inelegant tricks ;)
Truthfully the memory issue isn't a big deal for me right now, I was just trying to think of a use case where it'd be useful to have variable length epochs and such. While I do think it'd be useful to have more ways of adding "time metadata" to MNE objects other than using an events object, it doesn't seem crucial to implement now or anything...
My typical solution too when I need more metadata is to write inelegant custom scripts to make new events arrays by processing the existing one(s).
The following article may be relevant to this issue, especially with regard to language studies: http://kutaslab.ucsd.edu/people/kutas/pdfs/2015.P.157.pdf
Yeah but I don't get this debate. You can put together a designmatrix as desired and use our linear_regression function to produce rERP/F evoked objects. It's ready.
2015-05-01 16:01 GMT+02:00 J-R King notifications@github.com:
The following article may be relevant to this issue, especially with regard to language studies: http://kutaslab.ucsd.edu/people/kutas/pdfs/2015.P.157.pdf
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-98140310 .
I haven't finished reading the blabla, but I thought they proposed a way to handle partially overlapping events (not just nested). Is there somewhere an example using a designmatrix?
@kingjr the trick is to run the regression on the full EEG time course. So your predictors for partially overlapping events occurring with some temporal jitter with regards to each other are not fully collinear and can be separated via ordinary OLS. This is not covered by MNE currently, but I'd like to implement it.
@dengemann Consider an argument Nathaniel Smith makes: comparing means (evoked potentials per condition) can be understood as a type of regression. But there is a special MNE interface for evoked potentials, even though you could do it via linear_regression and a design matrix ... because it's more convenient.
But there is a special MNE interface for evoked potentials, even though you could do it via linear_regression and a design matrix ... because it's more convenient.
What do you mean?
file:///Users/dengemann/github/mne-python/doc/build/html/auto_examples/stats/plot_sensor_regression.html#example-stats-plot-sensor-regression-py
It's maybe not the best example as an example, but it exposes the API. I think with this you can do everything which goes in that direction, not?
I think with this you can do everything which goes in that direction, not?
No :) Unless you really abuse it.
But the hard part for Smith's approach is constructing the design matrix in the first place. The rest is a single line to fit a linear regression, and another few to plot specific coefficients.
But the hard part for Smith's approach is constructing the design matrix in the first place.
Ok.
Jumping back in this conversation with another thought - what if there were just added functionality to include a unique event ID with the "events" array? AKA, rather than implementing a bunch of extra metadata functionality, we just allow the user to supply an event ID for each row of "events". That way, people can use whatever metadata system they want, but they have at least one number that ties a specific MNE event to a particular entry in their system.
@choldgraf I actually do this in practice. Note that the epochs __repr__
then becomes horribly long then... And then you have to manually deal with epoch concatenation.
Couldn't you currently do this with the "description" field in "info"? If it's not worth implementing as a part of MNE, I was thinking of just including a dictionary for the description that contains an entry "event_unique_id" with a list of the same length as n_events.
can you give an example of the objective?
I think someone mentioned a similar use case above, but the basic idea for me is that I have a lot of metadata about any given trial. For example, I have trials in blocks of 3. I'd like to keep information about which block each stimulus presentation belongs to, but other than hard-coding this into event_id, I'm not sure how best to do it. Since I also have a lot of metadata too, I thought it would be easier to include an ID that I can lookup with my own trials metadata structure.
An example of metadata columns is as follows. Each trial is a spoken sentence, but the lengths of the sentences are not the same, and for regression purposes I'd like to know exactly how long each sentences took to play. Moreover, some sentences are filtered in particular ways depending on the block type. Here is information I have about each sentence:
The problem is that right now when I have an epochs object, there is a mapping onto integer numbers. I could make a complicated system that encodes all this information into a really long integer, but it seems easier/clearer to just have a single number associated with each stimulus presentation, and I can use this in my own lookup table of stimulus metadata. I suppose that I could only use the stimulus event code functionality for this (aka make it an integer that only serves as a lookup key, and doesn't tell me anything about the stimulus). What do you think?
Playing the devil's advocate:
Why make a more complex function at the MNE level? Build your metadata records of desired complexity and linke them to you epochs based on the epochs.selection vector which indicates the original position of the epoch (after dropping).
I really don't think we need a more complex events API inside MNE-Python.
2015-06-06 19:52 GMT+02:00 Chris Holdgraf notifications@github.com:
I think someone mentioned a similar use case above, but the basic idea for me is that I have a lot of metadata about any given trial. For example, I have trials in blocks of 3. I'd like to keep information about which block each stimulus presentation belongs to, but other than hard-coding this into event_id, I'm not sure how best to do it. Since I also have a lot of metadata too, I thought it would be easier to include an ID that I can lookup with my own trials metadata structure.
An example of metadata columns is as follows. Each trial is a spoken sentence, but the lengths of the sentences are not the same, and for regression purposes I'd like to know exactly how long each sentences took to play. Moreover, some sentences are filtered in particular ways depending on the block type. Here is information I have about each sentence:
- Sentence ID (aka file name)
- Filter type (or not filtered)
- Block filter type (are other sentences in the block filtered)
- Position within block
- Length of sentence
- Speaker ID of sentence
- Words spoken in sentence
- etc etc
The problem is that right now when I have an epochs object, there is a mapping onto integer numbers. I could make a complicated system that encodes all this information into a really long integer, but it seems easier/clearer to just have a single number associated with each stimulus presentation, and I can use this in my own lookup table of stimulus metadata. I suppose that I could only use the stimulus event code functionality for this (aka make it an integer that only serves as a lookup key, and doesn't tell me anything about the stimulus). What do you think?
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-109625466 .
TIL about the epochs selection attribute haha. That does look useful, and I think it could basically serve the functionality that I'd want. Can you overwrite an epoch's selection attribute? E.g., so if I wanted to create an EpochsArray object with a subset of epochs, then I could modify the "selection" attribute in place?
On Sat, Jun 6, 2015 at 5:46 PM, Denis A. Engemann notifications@github.com wrote:
Playing the devil's advocate:
Why make a more complex function at the MNE level? Build your metadata records of desired complexity and linke them to you epochs based on the epochs.selection vector which indicates the original position of the epoch (after dropping).
I really don't think we need a more complex events API inside MNE-Python.
2015-06-06 19:52 GMT+02:00 Chris Holdgraf notifications@github.com:
I think someone mentioned a similar use case above, but the basic idea for me is that I have a lot of metadata about any given trial. For example, I have trials in blocks of 3. I'd like to keep information about which block each stimulus presentation belongs to, but other than hard-coding this into event_id, I'm not sure how best to do it. Since I also have a lot of metadata too, I thought it would be easier to include an ID that I can lookup with my own trials metadata structure.
An example of metadata columns is as follows. Each trial is a spoken sentence, but the lengths of the sentences are not the same, and for regression purposes I'd like to know exactly how long each sentences took to play. Moreover, some sentences are filtered in particular ways depending on the block type. Here is information I have about each sentence:
- Sentence ID (aka file name)
- Filter type (or not filtered)
- Block filter type (are other sentences in the block filtered)
- Position within block
- Length of sentence
- Speaker ID of sentence
- Words spoken in sentence
- etc etc
The problem is that right now when I have an epochs object, there is a mapping onto integer numbers. I could make a complicated system that encodes all this information into a really long integer, but it seems easier/clearer to just have a single number associated with each stimulus presentation, and I can use this in my own lookup table of stimulus metadata. I suppose that I could only use the stimulus event code functionality for this (aka make it an integer that only serves as a lookup key, and doesn't tell me anything about the stimulus). What do you think?
— Reply to this email directly or view it on GitHub < https://github.com/mne-tools/mne-python/issues/1963#issuecomment-109625466
.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-109661621 .
I really don't think we need a more complex events API inside MNE-Python.
I agree. Still, as many users (will) encounter this complex design issue, I reckon there should be an introductory example that cover this topic and guide the user.
I agree. Still, as many users (will) encounter this complex design issue, I reckon there should be an introductory example that cover this topic and guide the user.
Amen.
;-)
Can you overwrite an epoch's selection attribute? E.g., so if I wanted to create an EpochsArray object with a subset of epochs, then I could modify the "selection" attribute in place?
Not sure I get it but you can overwrite it. It's semantic is as follows:
+100 for using the selection attribute to track epochs metadata. We wrote this exactly for this use case. Example and doc are always very welcome.
I could write a short blog-style example of the way I end up doing this. I heard rumblings about a website redesign that will make documentation easier to expand and upload. That still happening?
On Sun, Jun 7, 2015 at 1:17 PM, Alexandre Gramfort <notifications@github.com
wrote:
+100 for using the selection attribute to track epochs metadata. We wrote this exactly for this use case. Example and doc are always very welcome.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-109796650 .
Yeah, it is easier for us to update now. Changes made to the mne/doc
directory are easier to propagate to the dev
build website.
+1 for blogpost too
The doc is a topic of the sprint this summer
On 8 juin 2015, at 06:53, Eric Larson notifications@github.com wrote:
Yeah, it is easier for us to update now. Changes made to the mne/doc directory are easier to propagate to the dev build website.
— Reply to this email directly or view it on GitHub.
Cool - I can def help w/ docs too.
On Sun, Jun 7, 2015 at 11:18 PM, Alexandre Gramfort < notifications@github.com> wrote:
+1 for blogpost too
The doc is a topic of the sprint this summer
On 8 juin 2015, at 06:53, Eric Larson notifications@github.com wrote:
Yeah, it is easier for us to update now. Changes made to the mne/doc directory are easier to propagate to the dev build website.
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-109877202 .
So I've been playing around a little bit trying to use the selection parameter to maintain trial identity through derivations of epochs, and I'm running into a few snags. In my situation, I often re-create epochs objects after doing different kinds of processing. e.g., I'll create an epochs objects of raw data, then filter it for HG, then use EpochsArray to re-create the same object. Alternatively, with the full EpochsTFR class that I use, I subset frequency bands and create epochs.
What I've found is that while the selection parameter does a decent job of keeping track of trials, this becomes more difficult once you create a new epochs object.
E.g.: if I take an epochs object, then take a subset of trials and then create a new epochs object from this:
epochtmp = epoch[['mid', 'bef']]
epocha = mne.EpochsArray(epochtmp._data, epochtmp.info, epochtmp.events, tmin=epochtmp.tmin, event_id=epochtmp.event_id)
Now the selection parameter is reset to arange(n_epochs)
. I can hard-reset this by using epocha.selection = epochtmp.selection
...however, now subsetting the data throws an error because the LOG array is too short for the indices in selection
:
epocha.selection = epochtmp.selection
epocha['mid']
/home/knight/holdgraf/src/mne/mne/epochs.pyc in __getitem__(self, key)
1319 key_selection = epochs.selection[select]
1320 for k in np.setdiff1d(epochs.selection, key_selection):
-> 1321 epochs.drop_log[k] = ['IGNORED']
1322 epochs.selection = key_selection
1323 epochs.events = np.atleast_2d(epochs.events[select])
IndexError: list assignment index out of range
So to summarize two primary confusions:
I don't think either of these things are dealbreakers so long as the user remembers to copy over attributes from original epochs objects, just giving my experience.
One other thought that I am hesitant to give since you guys said you don't want more API complexity (understandably :)):
it might be useful to have an "EpochsInfo" class in the same way that MNE has an "Info" class for signals. This would basically just be a single object that stores all the same epochs information that are kept as attributes to Epochs
right now, but it would be a single place for this information and would thus be easier to keep track of perhaps. The user-side attributes could still be the same, with calls to the attributes behaving similar to ch_names
, aka it just pulls that value from the EpochsInfo
class. Just a thought!
Sounds like it would be sufficient to make the .selection work after subsetting. Your report indicates that there are some broken parts.
2015-06-18 19:57 GMT+02:00 Chris Holdgraf notifications@github.com:
So I've been playing around a little bit trying to use the selection parameter to maintain trial identity through derivations of epochs, and I'm running into a few snags. In my situation, I often re-create epochs objects after doing different kinds of processing. e.g., I'll create an epochs objects of raw data, then filter it for HG, then use EpochsArray to re-create the same object. Alternatively, with the full EpochsTFR class that I use, I subset frequency bands and create epochs.
What I've found is that while the selection parameter does a decent job of keeping track of trials, this becomes more difficult once you create a new epochs object.
E.g.: if I take an epochs object, then take a subset of trials and then create a new epochs object from this:
epochtmp = epoch[['mid', 'bef']] epocha = mne.EpochsArray(epochtmp._data, epochtmp.info, epochtmp.events, tmin=epochtmp.tmin, event_id=epochtmp.event_id)
Now the selection parameter is reset to arange(n_epochs). I can hard-reset this by using epocha.selection = epochtmp.selection...however, now subsetting the data throws an error because the LOG array is too short for the indices in selection:
epocha.selection = epochtmp.selection epocha['mid'] /home/knight/holdgraf/src/mne/mne/epochs.pyc in getitem(self, key) 1319 key_selection = epochs.selection[select] 1320 for k in np.setdiff1d(epochs.selection, key_selection):-> 1321 epochs.drop_log[k] = ['IGNORED'] 1322 epochs.selection = key_selection 1323 epochs.events = np.atleast_2d(epochs.events[select]) IndexError: list assignment index out of range
So to summarize two primary confusions:
- Creating a new epochs object from an old one that has a subset of trials removes information about the trial IDs in the original events object. (could be fixed by making it clear which attributes need to be overwritten to retain this information)
- It's still a bit confusing when you have subsets of trials how these correspond to the original events object (could be fixed by including an "events_orig" attribute or something?)
I don't think either of these things are dealbreakers so long as the user remembers to copy over attributes from original epochs objects, just giving my experience.
One other thought that I am hesitant to give since you guys said you don't want more API complexity (understandably :)):
it might be useful to have an "EpochsInfo" class in the same way that MNE has an "Info" class for signals. This would basically just be a single object that stores all the same epochs information that are kept as attributes to Epochs right now, but it would be a single place for this information and would thus be easier to keep track of perhaps. The user-side attributes could still be the same, with calls to the attributes behaving similar to ch_names, aka it just pulls that value from the EpochsInfo class. Just a thought!
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-113238389 .
Just to clarify - .selection works fine as long as it's the same epochs object, it just messes up when you use EpochsArray. I'm thinking that it might be easier for me if I just avoided using epochs array as much as possible, and instead just used Epochs.copy()
and overwrite the _data
parameter.
Sounds like this is also a relevant dimension of the Epochs refactoring -- see #2211
2015-06-18 20:08 GMT+02:00 Chris Holdgraf notifications@github.com:
Just to clarify - .selection works fine as long as it's the same epochs object, it just messes up when you use EpochsArray. I'm thinking that it might be easier for me if I just avoided using epochs array as much as possible, and instead just used Epochs.copy() and overwrite the _data parameter.
— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/1963#issuecomment-113242638 .
Yes @choldgraf can you try to produce the error using some minimal script based on the testing dataset? That way I can work it into #2211 and make sure it's resolved.
maybe stupid question but why do you have to use an EpochsArray if you have a proper Epochs object in the first place? If you need a copy of an Epochs instance just use the .copy method.
Now that the "extra info" issue has been closed I'm thinking about ways that I could incorporate my metadata with the metadata in MNE. This brought up an idea that might be worth implementing if it'd be useful.
The tricky thing about my data is that my events are not defined by triggers because I am playing sentences of variable length in a continuous fashion. To that extent, I can't just store the starting time of each trial...I also need to know the length.
In order to keep this flexible when I change sampling rates of the signal, I've been storing time information as a pandas dataframe that has one row for each trial. For any row, there are always two columns: 'start', 'stop'. These are stored in seconds to the highest possible precision.
Then, I append columns to each row to add information about that trial. For example, one row might look like this:
The values in
start
andstop
always correspond to 0 being the start of the data files. That way, as long as I have the sampling rate then I can use a function like this:stim_time_series = time_series_from_time_info(time_info, sample_rate, total_length)
and it will generate a time series were each row is a timepoint, then fill row/column values with values corresponding to the other columns in time_info. Total length defines the total length of the brain data that we've got. The result is a vector of stimulus information that is kind of like the ones that MNE uses to read in events information.
Would it be useful to contribute these kinds of functions to MNE? I could work on this after finishing up the layout stuff. I could imagine it being useful for anyone that doesn't store their event information as vectors along with the data.