pyRiemann / pyRiemann

Machine learning for multivariate data through the Riemannian geometry of positive definite matrices in Python
https://pyriemann.readthedocs.io
BSD 3-Clause "New" or "Revised" License
633 stars 164 forks source link

Geting Nan Inf errors for the TangentSpace(metric='riemann') transform #14

Closed YunshuLiu closed 9 years ago

YunshuLiu commented 9 years ago

Sorry to bother you, I have one question which I don't know the answer. First I would like to say it is great to see you apply Information Geometry to some real world problems and get amazing results, I am also using Information Geometry in my research, but applied to the field of entropy.

I am trying a simple example of pyRiemann, first I pass the data through covariance

cov_data_ = pyRiemannCovariances(epochs_data_)

then calculate tangent space

ts = pyRiemannTangentSpace(metric='riemann')
data_ = ts.fit_transform(cov_data_)

however, it give me a ValueError, saying array must not contain infs or NaNs. I checked the matrix of covdata and epochsdata, there is no Inf or Nan. I am wondering what are the conditions for my data in order for this to work? Does this mean one or more of the covariance matrix have zero det, but what is wrong with the original data so it have zero det.In other word, what is the condition for my data so that the covariance matrix have zero det? For example, if my data is

array([[[1, 4],
        [2, 5],
        [3, 6]],

       [[3, 4],
        [1, 2],
        [3, 6]]])

it will give me the same infs or NaNs error. Thanks in advance.

alexandrebarachant commented 9 years ago

Hi, This happen when your covariance matrices are not positive definite. This is a very common problem that occurs when 2 channels are colinear, or when you have less time sample than channels.

Riemannian geometry imply you have definite positive covariance matrices. If it not the case, you should use regularization. you can add regularization in the covariance estimator, like this :

cov_data_ = pyRiemann.Covariances(estimator='lwf').fit_transform(epochs_data_)

it should works fine with any data, but be aware that regularization can degrade perf by modifying the covariance structure.

YunshuLiu commented 9 years ago

Hi Alex, Thanks for the reply :) I add estimator='lwf' to the origninal data, it works! However, the small 2 * 3 * 5 random data example still does not work

epoch_data_ = np.array([[[ 1,  4,  7, 10,  4],
        [ 2,  5,  8,  8,  8],
        [ 3,  6,  9,  2,  2]],

       [[ 1,  2,  3,  6,  2],
        [ 4,  5,  8,  8,  1],
        [ 3,  5,  5,  2,  6]]])

cov_data_ = Covariances(estimator='lwf').fit_transform(epochs_data_) still give me bad covariance matrix. I can't figure out what's wrong with this fake data, is it possible to regularize this data to make the covariance matrix positive definite?

Bests, Yunshu

alexandrebarachant commented 9 years ago

Can you share some data or code. ledoit-wolf estimator is supposed to always give SPD matrices.

YunshuLiu commented 9 years ago

When I try to run the same code for the small example this morning, it actually worked. I must have using epochsdata instead of epochdata when I do it yesterday, it is my mistake.

The original data is from grasp-n-lift competition in kaggle. I actually post a question on the forum regarding your "Beat the Benchmark. 0.708" script, but I guess you have not visit the forum today. My question is why EEG are epoched using a window of 2 second before and after the event? I though we are not allowed to use any future data, or I misunderstood the meaning of future data?

alexandrebarachant commented 9 years ago

Yep, the grasp-n-lift dataset need regularization. I will answer your question on the forum :)

dmalt commented 7 years ago

Hi! I'm getting the same problem with CospCovariances. Is there a way to regularize them? I'm trying to run the classification on MEG resting state data, so some of the channels are definitely highly correlated.

alexandrebarachant commented 7 years ago

Hi. CospCovariance has a higher risk of non SPD matrix (which is the reason you get nan) because covariance are estimated in the frequency domain using a sliding window. Try to increased overlap as much as possible. If it does not work, you Can manualy force the matrices to be SPD with shrinkage.

dmalt commented 7 years ago

Ok, I'll try that. Thanks a lot!