matthias-da / robCompositions

Robust Methods for Compositional Data
11 stars 8 forks source link

Inconsistent results for pcaCoDa with solve='svd' and 'eigen' #14

Closed DrewGoodfellow closed 3 years ago

DrewGoodfellow commented 3 years ago

Hello, I have noticed that with the latest CRAN version of robCompositions, the solve= 'eigen' and 'svd' options give inconsistent results when method="classical" is chosen for pcaCoDa.

Here is my session info:

sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] robCompositions_2.3.0 data.table_1.14.0     pls_2.7-3             ggplot2_3.3.3        

loaded via a namespace (and not attached):
 [1] tidyr_1.1.3         splines_4.0.5       carData_3.0-4       rainbow_3.6         hdrcde_3.4          assertthat_0.2.1   
 [7] sp_1.4-5            stats4_4.0.5        cellranger_1.1.0    robustbase_0.93-7   pillar_1.6.0        VIM_6.1.0          
[13] lattice_0.20-44     glue_1.4.2          RColorBrewer_1.1-2  colorspace_2.0-1    Matrix_1.3-3        plyr_1.8.6         
[19] pcaPP_1.9-74        pkgconfig_2.0.3     haven_2.4.1         fda_5.1.9           purrr_0.3.4         mvtnorm_1.1-1      
[25] scales_1.1.1        ranger_0.12.1       openxlsx_4.2.3      rio_0.5.26          tibble_3.1.1        proxy_0.4-25       
[31] generics_0.1.0      car_3.0-10          ellipsis_0.3.2      withr_2.4.2         nnet_7.3-16         survival_3.2-11    
[37] magrittr_2.0.1      crayon_1.4.1        readxl_1.3.1        mclust_5.4.7        fds_1.8             ks_1.12.0          
[43] GGally_2.1.1        laeken_0.5.1        fansi_0.4.2         MASS_7.3-54         truncnorm_1.0-8     forcats_0.5.1      
[49] foreign_0.8-81      class_7.3-19        tools_4.0.5         hms_1.0.0           lifecycle_1.0.0     kernlab_0.9-29     
[55] munsell_0.5.0       cluster_2.1.2       zip_2.1.1           fpc_2.2-9           compiler_4.0.5      e1071_1.7-6        
[61] vcd_1.4-8           rlang_0.4.11        grid_4.0.5          RCurl_1.98-1.3      cvTools_0.3.2       bitops_1.0-7       
[67] boot_1.3-28         gtable_0.3.0        abind_1.4-5         flexmix_2.3-17      DBI_1.1.1           reshape_0.8.8      
[73] curl_4.3.1          rrcov_1.5-5         R6_2.5.0            zoo_1.8-9           prabclus_2.3-2      dplyr_1.0.6        
[79] sROC_0.1-2          utf8_1.2.1          KernSmooth_2.23-20  modeltools_0.2-23   stringi_1.6.1       NADA_1.6-1.1       
[85] parallel_4.0.5      Rcpp_1.0.6          zCompositions_1.3.4 vctrs_0.3.8         diptest_0.76-0      DEoptimR_1.0-8     
[91] tidyselect_1.1.1    lmtest_0.9-38      

Here's a script to illustrate the differences:


library(robCompositions)

data(arcticLake)

res1 <-  pcaCoDa(arcticLake, method = "classical", solve = "eigen")

res2 <-  pcaCoDa(arcticLake, method = "classical", solve = "svd")

res1 and res2 are different:

> summary(res1)

Importance of components:
                          Comp.1     Comp.2
Standard deviation     1.5225626 0.29619446
Proportion of Variance 0.9635354 0.03646458
Cumulative Proportion  0.9635354 1.00000000

> summary(res2)

Importance of components:
                         PC1    PC2
Standard deviation     1.588 0.8113
Proportion of Variance 0.793 0.2070
Cumulative Proportion  0.793 1.0000

> res1$loadings

          Comp.1     Comp.2
sand  0.73632495  0.3528346
silt -0.06259876 -0.8140934
clay -0.67372618  0.4612588

> res2$loadings

            PC1        PC2
sand -0.7778966  0.2480798
silt  0.1741048 -0.7977181
clay  0.6037917  0.5496383

Is this intentional? The results of the two solve methods can be brought closer together by changing line 114 of pcaCoDa.R (excepting the signs of the loadings and scores) to:


pcaIlr <- prcomp(xilr, scale = FALSE, center = TRUE)
> res1$loadings
          Comp.1     Comp.2
sand  0.73632495  0.3528346
silt -0.06259876 -0.8140934
clay -0.67372618  0.4612588

> res2$loadings
             PC1        PC2
sand -0.73632495 -0.3528346
silt  0.06259876  0.8140934
clay  0.67372618 -0.4612588

Thank you for reading this post and for contributing the useful R Package.

Best wishes,

Andrew

matthias-da commented 3 years ago

Dear Andrew, after careful revision: you are of course right. We changed the code and thank you very much. The github version includes the change, the CRAN version needs a bit to be uploaded. Best Matthias