Closed slowkow closed 2 years ago
Sorry for the long message. Here is a shorter example with the same result:
mat <- as.matrix(iris[,1:4])
mat_scaled <- scale(mat)
irlba::irlba(
A = mat_scaled,
nv = 3,
center = FALSE,
scale = FALSE
)$d
#> Warning in irlba::irlba(A = mat_scaled, nv = 3, center = FALSE, scale = FALSE):
#> You're computing too large a percentage of total singular values, use a standard
#> svd instead.
#> [1] 20.853205 11.670070 4.676192
irlba::irlba(
A = mat,
nv = 3,
center = colMeans(mat),
scale = apply(mat, 2, sd)
)$d
#> Warning in irlba::irlba(A = mat, nv = 3, center = colMeans(mat), scale =
#> apply(mat, : You're computing too large a percentage of total singular values,
#> use a standard svd instead.
#> [1] 20.853205 11.670070 4.676192
RSpectra::svds(
A = mat_scaled,
k = 3,
opts = list(center = FALSE, scale = FALSE)
)$d
#> [1] 20.853205 11.670070 4.676192
RSpectra::svds(
A = mat,
k = 3,
opts = list(center = TRUE, scale = TRUE)
)$d
#> [1] 1.7083611 0.9560494 0.3830886
Hi @slowkow, this is because the definition of scale
is a bit different in RSpectra. Below is the documentation from svds()
:
scale
Either a logical value (TRUE/FALSE), or a numeric vector of length n. If a vector s is supplied, then SVD is computed on the matrix (A - 1 c')S, where c is the centering vector and S = diag(1/s). If scale = TRUE, then the vector s is computed as the column norm of A - 1 c'. Default is FALSE.
So if you specify scale = TRUE
, svds()
scales the matrix to have unit norm, not standard deviation. You have two ways to adjust the singular values:
irlba()
, you can specify a vector instead of a TRUE/FALSE
value for scale
.
mat_svds_scaled <- RSpectra::svds(
A = mat,
k = n_pcs,
opts = list(center = TRUE, scale = apply(mat, 2, sd))
)
or
mat_svds_scaled <- RSpectra::svds(
A = mat,
k = n_pcs,
opts = list(center = colMeans(mat), scale = apply(mat, 2, sd))
)
sqrt(n - 1)
, and the ratio also applies to the singular values.
mat_svds_scaled <- RSpectra::svds(
A = mat,
k = n_pcs,
opts = list(center = TRUE, scale = TRUE)
)
mat_svds_scaled$d * sqrt(nrow(mat) - 1)
This blog post tells you how to get the same results in prcomp()
using svds()
.
@yixuan Thank you so much for the detailed reply! I really appreciate it. I admit that I do not understand the documentation, so your reply and blog post helped a lot.
Thank you! I'll bookmark and link.
Summary
When we use
RSpectra::svds()
withopts = list(center = TRUE, scale = TRUE)
, then the eigenvalues do not match the values returned bybase::svd()
orirlba::irlba()
.Please see the full example below for details.
For the sake of reproducibility, let’s use the iris data.
Columns are 4 features and rows are 150 flowers.
Example 1
Let’s run SVD without scaling the features.
All methods return the same values for u
All methods return the same values for v
All methods return the same values for d
Example 2
Let’s try again, but this time let’s center and scale each feature.
After scaling, the means should be equal to zero:
And the standard deviation should be equal to one:
Ok, let’s try SVD again:
All methods return the same values for u
All methods return the same values for v
Uh oh! It looks like
svds()
is not returning the correct values for dExample 3
Let’s try RSpectra again, but we’ll do the scaling manually.
Conclusions
Manual scaling eliminates the error in the
svds()
output. This implies that there might be a bug inside thesvds()
function.The wrong values are off by a multiplicative constant factor, but the value of the factor is dependent on the input data.
For the iris data, the factor seems to be 12.20656, which is close to the number of matrix multiplication operations (11)
Created on 2022-01-25 by the reprex package (v2.0.1)
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os macOS Catalina 10.15.7 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2022-01-25 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.2) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) #> digest 0.6.28 2021-09-23 [1] CRAN (R 4.0.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.1) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.2) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.2) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.0.2) #> irlba * 2.3.3 2019-02-05 [2] CRAN (R 4.0.2) #> knitr 1.36 2021-09-29 [1] CRAN (R 4.0.2) #> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.0.2) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.2) #> magrittr 2.0.1.9000 2020-12-15 [1] Github (tidyverse/magrittr@bb1c86a) #> Matrix * 1.3-4 2021-06-01 [1] CRAN (R 4.0.2) #> pillar 1.6.3 2021-09-26 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.2) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.2) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.0.2) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.2) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.2) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.2) #> RSpectra * 0.16-0 2019-12-01 [2] CRAN (R 4.0.2) #> rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.2) #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.2) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.0.2) #> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.0.2) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2) #> xfun 0.26 2021-09-14 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.2) #> #> [1] /Users/kamil/Library/R/4.0/library #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```