ivanfratric / libsubspace

C++ library for pattern recognition in subspaces
8 stars 4 forks source link

Negative eignvalues problem #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In subspace.cpp,around line 310:
double s = 1.00 / sqrt(eigenvalues[i][0]);

If the eigenvalue is zero or a negative number, the result will not be correct. 
I am doing a PCA for sampletype img so I have no idea whether I should expect a 
negative eigenvalue

Original issue reported on code.google.com by weijianh...@gmail.com on 24 Jun 2011 at 8:52

GoogleCodeExporter commented 9 years ago
Hi and thanks for the first report!
As the covariance matrix is positive semi-definite the eigenvalues can't be 
negative, so no worries there.
see: http://en.wikipedia.org/wiki/Covariance_matrix

However, they can be zero, so I'll add appropriate checks in order to avoid 
possible division by zero.

In the meantime, note that the eigenvalues are reordered in the descending 
order, so if you are not using full subspace dimensionality (which is usually 
the point of doing PCA), chances are that you are omitting all zero-eigenvalues.

Original comment by ivan.fra...@gmail.com on 24 Jun 2011 at 10:03

GoogleCodeExporter commented 9 years ago
I've changed subspace.cpp.
The following lines have been deleted from subspace.cpp:

for(i=0; i<N; i++) {
   double s = 1.00 / sqrt(eigenvalues[i][0]);
   for(j=0;j<n;j++) {
      eigenvectors[i][j] *= s;
   }
}

The purpose of these lines was the subspace axis normalization. Instead, 
normalization is handled later in line:

subspace->Normalize();

I've also made appropriate changes to the Subspace::Normalize() method to avoid 
possible problems with zero-norm vectors.

Original comment by ivan.fra...@gmail.com on 24 Jun 2011 at 11:00

GoogleCodeExporter commented 9 years ago
Thanks for the instant response!

For some reason,I am getting a very small negative value for the first 
eigenvalue, -0.01e-09, which is causing me an error. I think it should be zero 
with some floating point errors. Would it be necessary to use 
if(-0.0001<norm<0.0001) or sth like that or it doesn't matter at all? 

As for the dimensionality you mentioned, it seemed to me that there is no way 
in the code for me to set the dimensionality when creating the subspace. The 
dimensionality only comes in when you are testing the subspace. Am I wrong in 
saying that? I am not quite familiar with PCA and I apologize if I show a lack 
of understanding. 

But overall, I would like to say that the library works great for me actually! 
Thanks a lot! 

Original comment by weijianh...@gmail.com on 24 Jun 2011 at 2:21

GoogleCodeExporter commented 9 years ago
Is -0.01e-09 your largest eigenvalue or are there other eigenvalues with larger 
(positive) values? Getting -0.01e-09 as the largest eigenvalue would indicate 
an error somewhere.
Right now, there is no way to limit the subspace dimensionality when learning 
it, maybe I'll add this functionality later.
The idea is to learn the subspace that will contain all the axes, and later in 
testing you can choose the dimensionality that gives the best performance.
You could also determine the desired dimensionality of the subspace by using 
all eigenvalues larger than some threshold, for example

long GetDimensionality(Subspace *subspace, double *maxEV) {
   long dimensionality;
   double *eignvalues = subspace->GetAxesCriterionFn();
   for(dimensionality = 0; dimensionality<subspace->GetSubspaceDim(); dimensionality++) {
      if(eignvalues[dimensionality]<maxEV) break;
   }
   return dimensionality;
}

Original comment by ivan.fra...@gmail.com on 24 Jun 2011 at 3:44

GoogleCodeExporter commented 9 years ago
Thanks for your reply. 

I checked the eigenvalues after this line:
if(!eigen(XTX.GetData(),tmpv.GetData(),eigenvalues.GetData(),XTX.GetNumRows(),ve
rbose))return 0;

eigenvalues[0][0] = -1.0391931828945822e-009
eigenvalues[1][0] = 382965.15429347486
eigenvalues[2][0] = 443996.38986379583
...
eigenvalues[39][0] = 4325389.9459962919

Seems to me that they are in ascending order at this stage. 
Are they eventually reordered at this line:
subspace->ReorderAbsDescending(); ?

There is also a huge jump from the first eigenvalue to the 2nd eigenvalue, 
which I do not understand why. Since I presume the first eigenvalue is zero 
anyway, does it means that the axes is not important at all?  

As for the dimensionality, I have 40 sample images with each image having 92 * 
112 pixels. It seems that there are only 40 eigenvalues here, since 
eigenvalues[40][0] = -2.5301711256524607e-098, which is undefined actually. 
Based on my understanding, the dimensionality should be 92 * 112 = 10304 and I 
should have 10304 eigenvalues instead?

I am sorry for my unclear understanding of PCA and thank you for answering my 
questions. 

I have attached my images, the learn files and the commands I use, in case you 
need to reproduce. Thanks!

Original comment by weijianh...@gmail.com on 25 Jun 2011 at 3:14

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The values you are getting seem OK actually.
If you are learning PCA subspace with N samples, you can only get N-1 
meaningful (non-zero) eigenvalues, as that will be the rank of the covariance 
matrix. In your case, you are using 40 learn samples and getting 39 
eigenvalues. The remaining one should be 0, but probably ends up small negative 
number due to limited precision. So you could only get 10304 dimensionality if 
you had 10305 learning samples.
Your case (having much less learning samples than sample dimensionality) is 
common in image processing, and especially biometrics (which seem to be your 
application).

You may also note that libsubspace behaves differently depending on whether you 
have larger dimensionality or number of samples. Your case is actually much 
faster as eigenanalysis of a smaller (40x40) matrix is needed instead of 
eigenanalysis of a full covariance matrix (which would be 10304x10304). The end 
result in both cases would be the same. If you are interested in details, see 
for example
http://en.wikipedia.org/wiki/Eigenface
, section "Computing the eigenvectors"

Before subspace is returned, eigenvectors (and eigenvalues) are reordered in 
the descending order of the corresponding eigenvalues, so the eigenvector with 
"problematic" eigenvalue will end up last (40th).

In your tests, it only makes sense to use dimensionality 39 or lower (as you 
have only 40 learinig samples), so the 40-th eigenvalue will never be used.

I should modify the library to trim the dimensionality of the subspace to N-1 
to avoid this confusion in the future. Similarly, in LDA, there would be only 
C-1 meaningful eigenvalues (where C is the number of classes).

Original comment by ivan.fra...@gmail.com on 25 Jun 2011 at 10:39

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I modified the Subspace class adding the Trim(long dim) method that trims the 
subspace to the desired dimensionality.

Also, PCA subspace is now automatically trimmed to N-1 dimensionality and LDA 
subspace to C-1 dimensionality.

Original comment by ivan.fra...@gmail.com on 25 Jun 2011 at 11:20

GoogleCodeExporter commented 9 years ago
Thanks very much for the explanation!

Original comment by weijianh...@gmail.com on 27 Jun 2011 at 3:03