sinhrks / ggfortify

Define fortify and autoplot functions to allow ggplot2 to handle some popular R packages.
Other
525 stars 65 forks source link

PCA score results are different #221

Closed YonghuiDong closed 2 years ago

YonghuiDong commented 2 years ago

With autoplot

library(ggfortify)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res, data = iris, colour = 'Species')

b88e6f30-85c9-47aa-a9c6-025b88bd8442

Self calculated

percentage <-round(pca_res$sdev / sum(pca_res$sdev) * 100, 2)
percentage
53.53 29.96 12.00  4.51

You can see that the variance explained by PC1 and PC2 are: 53.53 and 29.96, respectively, which are very different from the result shown in the above figure.

score plot using ggplot2

ggplot(pca_res, aes( x = PC1, y = PC2, color = Group)) +
    geom_point() +
    theme_bw() +
    xlab(percentage[1]) +
    ylab(percentage[2])

555f0809-065f-4219-b56f-e75cb3d1e1dd

terrytangyuan commented 2 years ago

Is this no longer an issue?

YonghuiDong commented 2 years ago

Hi@terrytangyuan,

The problem was solved. It was a stupid mistake made by me.

Variance should be calculated as

percentage <-round(pca_res$sdev^2 / sum(pca_res$sdev^2) * 100, 2)

I got exactly the same result as ggfortify

Best regards,

Dong

terrytangyuan commented 2 years ago

Great