mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

Compute t/f representation with variable-length epochs? #5612

Open cbrnr opened 5 years ago

cbrnr commented 5 years ago

In many studies we conduct, people solve simple arithmetic problems. I am interested in EEG activity during the problem solving phase, which differs in length (1) from epoch to epoch and (2) from person to person.

These variable-length epochs are problematic, because I cannot simply compute a t/f representation with standard MNE functionality. Since I am mainly looking at oscillatory activity in specific frequency bands, my current workaround is to average band power over time within each epoch so that only one value per epoch remains. Of course, this means that I completely lose any time information, which is why this is a really crude approach.

EEGLAB has an option to "timewarp" individual epochs (https://github.com/eeglabdevelopers/eeglab/blob/develop/functions/timefreqfunc/newtimef.m#L144). I haven't really looked at the code so I'm not exactly sure how this works, but it seems to adequately address my problem. Does MNE have something similar? What do others do with such kind of data? I haven't seen any studies using the same variable-length epochs structure out there, and it seems like people just use some (arbitrary) fixed length cutoff. However, this is not ideal because I would like to make sure that only task-relevant brain activity gets into the calculation, so I don't want to cut off epochs at a fixed latency. Instead, I want my epochs to only contain EEG activity related to the problem solving process (i.e. from problem presentation up to a response).

Any ideas or sugggestions?

BTW, this is not about creating an object to hold variable-length epochs (this has been discussed in #3533).

larsoner commented 5 years ago

These variable-length epochs are problematic, because I cannot simply compute a t/f representation with standard MNE functionality.

You can use the _array functions and either pass them one at a time or (what I would try first) pad them all with 0 or NaN to be the length of the longest epoch.

... it seems like people just use some (arbitrary) fixed length cutoff. However, this is not ideal because I would like to make sure that only task-relevant brain activity gets into the calculation, so I don't want to cut off epochs at a fixed latency. Instead, I want my epochs to only contain EEG activity related to the problem solving process (i.e. from problem presentation up to a response).

Even if you obtain TF representations of length 10 s when some trials only have 5 sec of usable information, you can use NumPy indexing and other simple operations to decide how to process the data in the TF plane. And for viz with things like imshow or pcolor you can mask the invalid region.

my current workaround is to average band power over time within each epoch so that only one value per epoch remains. Of course, this means that I completely lose any time information, which is why this is a really crude approach.

You could instead compute the envelope of the Hilbert transform of the band-passed data. This would give you time-varying power in a band. You could even do this step on Raw and then epoch it directly.

agramfort commented 5 years ago

I've seen people doing binning of epochs by length.

cbrnr commented 5 years ago

Hm. I think the problem is that the t/f map is computed by averaging over epochs. Therefore, filling with NaNs to make all epochs the same length would probably work, but this would be different from what I'd like to achieve. Ideally, each epoch should contain 100% of the process of interest. If I fill the shorter epochs with NaNs, averaging will not average activities of the processes correctly. I think the right way to do this would be to first scale/stretch all epochs to the same length (100%) and then do the averaging.

@agramfort can you point me to a publication where people binned epochs by length?

agramfort commented 5 years ago

what do you mean stretch / scale?

no sorry I don't have published work in mind, but do you understand what I mean?

cbrnr commented 5 years ago

what do you mean stretch / scale?

To compute a t/f map, I need to average over epochs of the same length. If I have variable-length epochs, I need to scale them so that they are the same length. In the audio domain, such algorithms are known as time stretching (see e.g. here). I tried applying such an algorithm to EEG epochs, but it doesn't work since this approach seems to be optimized for audio data. In EEGLAB, it is possible to stretch single epoch t/f representations before averaging, which they call time warping. I'm not sure this is the same as dynamic time warping, which is popular in bioinformatics, but less so in neuroscience.

Again, the idea is that I want to capture the calculation process, which in general varies in length from epoch to epoch. This paper uses the EEGLAB functionality I mentioned to align the heelstrikes to create constant-length epochs.

no sorry I don't have published work in mind, but do you understand what I mean?

Sort of. However, I'm not sure how binning epochs help in computing a t/f map.

agramfort commented 5 years ago

if you cannot tell when things happens as it varies I would use a CSP, average the (log)power in CSP "source space" and see if you have a statistical effect in this "feature space"

cbrnr commented 5 years ago

Hm, not sure I understand how that would work. I'll try playing around a bit with dynamic time warping (e.g. using fastdtw). I think DTW could be applied to each single-epoch spectrum to make them the same length, after which they could be averaged just like normal. I think that's what EEGLAB is doing. If I come up with something useful I'll report back, maybe this is something MNE users would also like.

jona-sassenhagen commented 5 years ago

FWIW I'd love support for TF decomposition of raws, and support for operating on continuous TF data.

But that's a big project that probably few will benefit from.

agramfort commented 5 years ago

you mean viz or computation? what is the use case?

jona-sassenhagen commented 5 years ago

A few examples:

agramfort commented 5 years ago

provide me a nice dataset one could use for demo / education and we'll see how to make this happen.

You convinced me of the necessity of metadata with the kiloword dataset ...

cbrnr commented 5 years ago

Actually, it was much simpler than I thought. Basically, I resampled all individual TFRs (for each variable-length epoch) to a common length. Then I average these new epochs to create the final TFR plot. Below is a comparison between EEGLAB and MNE for real EEG data. The upper row shows the normal TFRs computed on 5s epochs (after the cue) - 5s was the length of the longest epoch, most epochs are way shorter than that. The bottom row shows the "warped" result, where I only use data from the actual variable-length epochs.

out

Beside the scale of the data (which isn't the same obviously) and the ugly colormap (which I set because that's what EEGLAB uses) I think the results look pretty good.

cbrnr commented 5 years ago

Plus, the x axis labels are incorrect, because in the warped results (bottom row), the data from 0s to 5s are actually the warped time points from cue until the response, so it should really be labeled 0% to 100% or something.

agramfort commented 5 years ago

I prefer the mne-python results :)

cbrnr commented 5 years ago

😛 yes but this was not the point. I think I'll prepare a short example, but I need to find some open dataset that features variable length epochs. Any ideas?

agramfort commented 5 years ago

no idea on my side

mmagnuski commented 5 years ago

@cbrnr You could pick any dataset that has information on reaction times - 'epoching' from -x seconds wrt the stimulus onset up to x seconds wrt the reaction time will give you variable-length epochs.

jona-sassenhagen commented 5 years ago

E.g. the EEGLAB data.

JohnAtl commented 5 years ago

What was the final outcome of this? I have subjects performing a reach and grasp task, EEG and kinematics collected, and would like to align the EEG per reach and grasp phase for TF analysis.

cbrnr commented 5 years ago

I think we decided not to integrate this into MNE because it's probably not interesting for most people. However, I have been working on writing up a blog post on how to create these time-warped t/f maps, but I got stuck because I didn't really have good example data. The EEGLAB data does have variable length epochs, but they are very short, so the results do not look very interesting. I'd prefer if I found a better dataset (open access, freely available), but if not I'll probably just make do with what I have. I'll let you know in this issue once I've published the post (might still take a little while to polish everything).

mmagnuski commented 5 years ago

@cbrnr - are you sure about eeglab allowing variable length epochs? If so - this must have changed lately as eeglab always had a requirement of same-length epochs. I was using fieldtrip a bit lately (I'm staying at Donders for some time) and I must say I like how it deals with variable length epochs and how that allows partial rejection (removing only parts of your epochs when artifacts do not span the whole epoch time range - which is really useful for long epochs). I even started thinking back about allowing nans in mne epochs - that would be a big change but would allow both variable-length epochs (although not very memory efficient) and partial rejection.

cbrnr commented 5 years ago

The EEGLAB example data contains markers that would lead to variable length epochs. What they do in their tutorials is to make them fixed length. So yes, EEGLAB (and MNE) cannot deal with variable-length epochs, hence the need for a custom solution.

AaronWill-Git commented 2 years ago

Thank you for your donation in MNE! I am recently using MNE to analysing EEG data and attache it with gait events. We often normalize varied step time lenth as 0-100%, it would be great if the vareid time lenth EEG can also be normalized into 0-100%. Thanks, sir! May I know the link for your “blog on how to create these time-warped t/f maps”?

I tried to change epochs into numpy array, and then resample this into same length.

cbrnr commented 1 year ago

Yes, EEG data locked to gait cycles is a great example where this would be useful. I am currently restructuring my blog at https://cbrnr.github.io/ (I'm switching to Quarto), and after that I will be adding an article on how to compute t/f-maps for variable-length epochs.

It is basically resampling, but you have to resample in the frequency domain and not in the time domain. Resample the time data will change the frequency content, which you want to avoid.

JohnAtl commented 1 year ago

Rather than warping or other techniques, I wound up doing short-time Fourier transform with a window of 500ms around the times of interest for each movement. My code (Matlab) is on the Uni's archive too. Links in the article. https://link.springer.com/article/10.1007/s00221-022-06340-8

AaronWill-Git commented 1 year ago

Yes, EEG data locked to gait cycles is a great example where this would be useful. I am currently restructuring my blog at https://cbrnr.github.io/ (I'm switching to Quarto), and after that I will be adding an article on how to compute t/f-maps for variable-length epochs.

It is basically resampling, but you have to resample in the frequency domain and not in the time domain. Resample the time data will change the frequency content, which you want to avoid.

Thanks for reminding! Yes, resample should be in the frequency domain. I decided to use the line interpolate to warp the time into fix length epochs with varied time interval.

cbrnr commented 1 year ago

Then you are already pretty much doing what I'm doing 😄!

AaronWill-Git commented 1 year ago

Then you are already pretty much doing what I'm doing 😄!

Dear Clemens,

Hope you are doing well! Thanks for your time.

I just got stucked in the step to line interpolate the data. My plan to to divide each step into fixed epochs with a longer length than the normal step. Then I calculte the ERSP using the "mne.time_frequency.EpochsTFR". Next step I want to interplotate the results of EpochsTFR, but I got stucked in this step. I use tfr.average().data to obtain the nd. array of the data and then crop to the taget step length and line interpolate it to a longer fixed length. But I have no idea how to write the results back to the tfr.average().data.

May I ask if you have any comments or suggetions to my work? Maybe the reason is that I am a novice python user. It would be great if you could give me some advice?

Thank you for your help!

cbrnr commented 1 year ago

You can create a new TFREpochs object and fill it with the new data. If you have further questions, please ask them in our forum: https://mne.discourse.group/

AaronWill-Git commented 1 year ago

Thanks for your suggestion! Will do this in the forum.

changkun commented 1 year ago

@cbrnr Thanks for arising this interesting discussion. We are running an experiment with multiple participants on the same task, where they have to make a selection. However, participants perform differently. And if we cut the epochs from the presentation of an item and its selection, epochs will have variable lengths. What would be your suggestion for dealing with this situation when processing the data?

Is it sound to rescale epochs to a common length using methods, such as dynamic time wrapping, linear resampling, etc?

Could you maybe point me to some literature for analysis approaches that solved this problem?

cbrnr commented 1 year ago

I think it depends on what you want to do with your variable-length epochs. If you are using frequency-domain features such as ERD/ERS, you need to make sure that rescaling is performed in that domain (otherwise, if you resample in the time domain, you will change the frequency content).

There isn't really a whole lot of literature on that topic. I am still working on my article for my blog, which I hope to finish soon, and hopefully I will find some relevant prior work.

FrancescoChiossi commented 1 year ago

@cbrnr , @JohnAtl @changkun and to all the people interested in this topic, while we work on DTW for ERPs or rescaling for frequency analysis, I might have found an alternative for cognitive processing identification.

HsMM-MVpy relies on the assumption that periods with a constant EEG pattern corresponded to a processing stage. They use hidden semi-Markov models and multivariate pattern analysis for identifying different cognitive processing stages based on EEG data.

It would actually be interesting to study the differences on this approach as compared to the ones proposed by @cbrnr .

Here the reference to their main paper .

cbrnr commented 1 year ago

Thanks @FrancescoChiossi, this looks like a very fancy method! It will take me some time to understand what they are doing, but please let us know if you find anything in your analyses! And yes, I still need to summarize the time warping procedure, I hope I find time soonish (so many other things to do right now).

changkun commented 1 year ago

If you are using frequency-domain features such as ERD/ERS, you need to make sure that rescaling is performed in that domain (otherwise, if you resample in the time domain, you will change the frequency content).

@cbrnr Would you mind elaborating a bit more on this? We are currently trying to resample the epochs with variable lengths to the same size. The goal is to extract the ERPs per each epoch. Should we apply the resampling before or after cutting the continuous signal into epochs? Could the signal be distorted if we resample after epoching?

And second, if we have an original sampling rate of 500 Hz (device one), does linear resampling impact the sampling rate if we apply it in the frequency domain?

changkun commented 1 year ago

@cbrnr ping?

cbrnr commented 1 year ago

Sorry, I forgot about this thread.

I was always talking about resampling in the frequency domain so that frequency information does not change, but is only stretched/squashed in time. You don't need to do that if you are not going to work in the frequency domain.

I have zero experience with how resampling epochs to make them equal lengths would affect the ERP (and if that is even a valid thing to do). Having said that, I don't know how you would resample the continuous signal when your goal is to make your epochs equally long. Therefore, I'd resample the epochs, and to avoid edge artifacts it's probably a good idea to include some extra time segments before and after the true epochs.

I'm not sure I understand your question about the sampling frequency.

In any case, since most ERP components appear stimulus-locked, and also within the first few hundred milliseconds, I would not even bother trying to make your epochs equally long. What is the time window you are looking at? Is it not possible to use the first e.g. -200 to +800ms (thereby discarding everything that happens afterwards)?

lek-hong-godot commented 3 months ago

Actually, it was much simpler than I thought. Basically, I resampled all individual TFRs (for each variable-length epoch) to a common length. Then I average these new epochs to create the final TFR plot. Below is a comparison between EEGLAB and MNE for real EEG data. The upper row shows the normal TFRs computed on 5s epochs (after the cue) - 5s was the length of the longest epoch, most epochs are way shorter than that. The bottom row shows the "warped" result, where I only use data from the actual variable-length epochs.

out

Beside the scale of the data (which isn't the same obviously) and the ugly colormap (which I set because that's what EEGLAB uses) I think the results look pretty good.

@cbrnr would it be possible to share code snippets/samples in mne python? I'm very keen to explore warping epochs such that they are of the same duration.

cbrnr commented 3 months ago

Sure, I've made my repo public: https://github.com/cbrnr/timewarp

It's in a very early stage, but let me know if you have any questions.

Krithiks7 commented 3 months ago

I am facing a similar issue in my project as well. I also have to analyse features of epochs of varying lengths. Could you let me know if the time warping can be done in MATLAB and if yes how? Also could you provide an example of using this on EEG data in python itself?

cbrnr commented 3 months ago

@Krithiks7 you can find an EEG example in my timewarp repository. If you want to do that in MATLAB, there's a function for EEGLAB (linked in the first comment of this thread).