ramhiser / sparsediscrim

Sparse and Regularized Discriminant Analysis in R
Other
14 stars 5 forks source link

Fix failing HDRDA test for Windows i386 #39

Open ramhiser opened 7 years ago

ramhiser commented 7 years ago

I removed a failing test in 701a882eb5cedb97b829b1ec527670cf3370c489 so that the R package could be pushed to CRAN. I'd like to add the test back in but have it pass for Windows i386.

The failing test checked that HDRDA's calculations are correct for the special case of (lambda, gamma) = (0, 0) on the iris data set.

** running tests for arch 'i386' ... [12s] ERROR
  Running 'testthat.r' [10s]
Running the tests in 'tests/testthat.r' failed.
Last 13 lines of output:
  [3]  -0.00192 -  0.00192 == -0.00385
  [4]   0.00376 - -0.00376 ==  0.00753
  [5]  -0.07789 -  0.07789 == -0.15578
  [9]  -0.00192 -  0.00192 == -0.00385
  [13]  0.00376 - -0.00376 ==  0.00753

  testthat results ================================================================
  OK: 160 SKIPPED: 0 FAILED: 3
  1. Failure: HDRDA's calculations are correct for (lambda, gamma) = (0, 0) (@test-hdrda.r#60) 
  2. Failure: HDRDA's calculations are correct for (lambda, gamma) = (0, 0) (@test-hdrda.r#61) 
  3. Failure: HDRDA's calculations are correct for (lambda, gamma) = (0, 0) (@test-hdrda.r#62) 

  Error: testthat unit tests failed
  Execution halted
** running tests for arch 'x64' ... [11s] OK
  Running 'testthat.r' [11s]

In https://github.com/ramhiser/sparsediscrim/issues/38#issuecomment-322020385 @DarioS suggested the issue was:

It turns out that the sign of some of the off-diagonal numbers are the opposite to what is expected. It's likely that the eigenvectors used in the calculation of W_k have the opposite sign on a 32-bit computer.

DarioS commented 7 years ago

I narrowed it down to the eigen function and its use as Sigma_eigen <- eigen(Sigma, symmetric=TRUE) in the test's R code.

On 32-bit Windows 7:

> Sigma_eigen
eigen() decomposition
$values
[1] 0.43469460 0.08445964 0.05424531 0.02191645

$vectors
          [,1]        [,2]       [,3]       [,4]
[1,] 0.7377526  0.05608598  0.6323782  0.2295067
[2,] 0.3205660 -0.87323191 -0.1805701 -0.3195276
[3,] 0.5728512  0.45883202 -0.5818222 -0.3504249
[4,] 0.1574803 -0.15425166 -0.4785136  0.8499595

On 32-bit Debian 8:

> Sigma_eigen
$values
[1] 0.43469460 0.08445964 0.05424531 0.02191645

$vectors
           [,1]        [,2]       [,3]       [,4]
[1,] -0.7377526  0.05608598  0.6323782  0.2295067
[2,] -0.3205660 -0.87323191 -0.1805701 -0.3195276
[3,] -0.5728512  0.45883202 -0.5818222 -0.3504249
[4,] -0.1574803 -0.15425166 -0.4785136  0.8499595

The first eigenvector calculated in Windows is -1 times the eigenvector calculated in Linux. Since multiplying an eigenvector by -1 results in another valid eigenvector, it's not a bug in eigen but a case for sparsediscrim to handle. It's also not a case of 32-bit vs. 64-bit, but the different library R uses on Windows and Linux for eigenvector calculation.