kevinblighe / PCAtools

PCAtools: everything Principal Components Analysis
329 stars 67 forks source link

Biplot variance really high when center=FALSE for pca #31

Closed rpolicastro closed 4 years ago

rpolicastro commented 4 years ago

Setting center=FALSE in the pca function results in (impossibly) high explained variance in biplot. This is for the latest version on bioconductor (2.0.0).

I originally noticed this with rlog counts from DESeq2, but it seems to occur with any data, such as the random data I generate here.

example <- replicate(6, rnorm(1000, 10, 1))

> head(example)
          [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
[1,] 10.479310 10.080932 11.295209 10.957110 10.780359 10.491758
[2,]  9.884298  9.064395  9.784546  9.628452 10.811150  9.340487
[3,]  9.302682  9.344533  9.814822  9.818255  9.766677 10.531491
[4,] 10.593042  9.959568 10.244158  9.347088  9.387294 10.535189
[5,] 10.235262  9.995187 10.245733 10.816074 10.155518  9.805453
[6,] 10.377442 10.016121  8.671987  8.486263  9.919841  9.906467

It works when you leave the default center=TRUE.

library("PCAtools")
library("magrittr")

example %>%
  pca %>%
  biplot

center equals true

However, setting center=FALSE causes the high variance.

example %>%
  pca(center=FALSE) %>%
  biplot

center equals false