ENH: ICA on specific time segments within raw data

cbrnr commented 7 years ago

I'd like to run ICA only on specific time segments within my (BrainVision) raw data. I have these segments given in two np.arrays, one with the start samples and one with the stop samples, e.g.

start = np.array([10, 100, 500])
stop = np.array([35, 250, 570])

What is the preferred way to select a slice of the raw data containing all channels but only time samples between start and stop? I've tried to create an index array with np.r_ using

mask = np.concatenate([np.r_[a:b] for a, b in zip(start, stop)])

In this example, the mask selects 245 elements. In a regular np.array, slicing with this index array works as expected, but I get an error when I try to slice my raw object directly:

TypeError: only length-1 arrays can be converted to Python scalars

Maybe I'm doing this the completely wrong way and there's an obvious best practice that I don't see at the moment. Since my goal is to run ICA on just a subset of the original raw data, I thought that I'd call the fit method with the subsetted data - is this possible somehow or is there another way to tackle this use case with MNE?

mmagnuski commented 7 years ago

You could use raw.crop and mne.concatenate_raws to get a raw that consists only of segments you want. If start and stop are in seconds that would be something like:

selected_raw = mne.concatenate_raws([raw.copy().crop(tmin=t1, tmax=t2)
                                    for t1, t2 in zip(start, stop)])

but bear in mind that you need to have enough timepoints to fit ICA well, citing eeglab folks:

As a general rule, finding N stable components (from N-channel data) typically requires more than kN^2 data sample points (at each channel), where N^2 is the number of weights in the unmixing matrix that ICA is trying to learn and k is a multiplier. In our experience, the value of k increases as the number of channels increases. In our example using 32 channels, we have 30800 data points, giving 30800/32^2 = 30 pts/weight points. However, to find 256 components, it appears that even 30 points per weight is not enough data.

Another way to do that would be to first segment the data into short windows and reject those that you do not need (you can take a look at mne.epochs._segment_raw - it works well for me but it is not public, so it may change in the future without any warning).

cbrnr commented 7 years ago

Thanks, this almost works - my start and stop times are in samples, but crop wants seconds. I could convert my samples to seconds, but I'd prefer a solution that works directly on samples because of rounding errors.

I guess working directly on raw._data (i.e. cropping the internal array directly and putting that back into the raw object) is not a viable option?

mmagnuski commented 7 years ago

I think you could try cropping _data directly - you would probably have to crop .times to relevant length too.

larsoner commented 7 years ago

I guess working directly on raw._data (i.e. cropping the internal array directly and putting that back into the raw object) is not a viable option?

You could do this and the create raw_new = mne.io.RawArray(data_new, raw.info) and it should work okay.

@jaeilepp would Annotations allow skipping of time segments? Like maybe any not-to-be-used segments could be marked as such, and then ICA could be made Annotation-aware?

cbrnr commented 7 years ago

Thanks @mmagnuski and @Eric89GXL, I will try that.

Annotation-aware ICA would be super awesome (as long as everything remains transparent, i.e. annotated segments should not be removed by default when calling ICA).

larsoner commented 7 years ago

i.e. annotated segments should not be removed by default when calling ICA).

My thinking was that any annotation with the text "bad" in it should be removed/ignored by default in Annotation-aware functions, including (eventually) ICA. No?

larsoner commented 7 years ago

... in the meantime I'll close this in favor of #3731, since there is a suitable workaround (RawArray) and #3731 covers the cleaner solution

jaeilepp commented 7 years ago

maybe any not-to-be-used segments could be marked as such, and then ICA could be made Annotation-aware

Yes, I can take a look.

cbrnr commented 7 years ago

I'm not familiar with MNE's annotations, but I guess even if there are annotations with the word "bad" there will be an argument to turn automatic removal off, right? For epochs there's reject_by_annotation=False, and something similar should be available for continuous data. What I meant was that there should be an option to ignore bad annotations in case the user wants to use all data samples.

mne-tools / mne-python

ENH: ICA on specific time segments within raw data #3952