kevinblighe / PCAtools

PCAtools: everything Principal Components Analysis
329 stars 67 forks source link

Ellipse function returning weird results #37

Closed luizalmeida93 closed 3 years ago

luizalmeida93 commented 3 years ago

Hi all,

I am trying to use PCAtools to draw a biplot with ellipses, but the results are quite weird. Apparently, I am getting two ellipses for the same group, while an ellipse is missing for one of the groups (with 3 replicates only).

Here is the code:

#Loading the data
quant_matrix <- as.matrix(read.table(file = "Data-test_PCA.txt", header = T, row.names = 1, sep = "\t", stringsAsFactors = FALSE))

### PCA ####

meta_data <- matrix(c(rep("X",4),rep("Y",4),rep("Z",3)),ncol = 1) # categorial information, must match the number of columns from the matrix
colnames(meta_data) <- "Group"
rownames(meta_data) <- colnames(quant_matrix)

pca_test <- pca(quant_matrix, metadata = meta_data) #function for the PCA

biplot_test <- biplot(pca_test, shape = "Group",
                             colby = "Group", colkey = c("X" = 'forestgreen', "Y" = 'red', "Z" = 'blue'),
                             ellipse = TRUE,
                             ellipseConf = 0.95,
                             ellipseFill = TRUE,
                             ellipseAlpha = 1/4,
                             ellipseLineSize = 1.0,
                             xlim = c(-45,45), ylim = c(-25, 20),
                             #hline = 0, vline = c(-20, 0, 20),
                             legendPosition = 'top', legendLabSize = 10, legendIconSize = 4.0);biplot_test #drawing the biplot

ggsave(filename = "PCA_test1.pdf",plot = biplot_test,device = "pdf",width = 9,height = 10) #saving the results as a PDF file

To compare with another tool, I loaded the same dataset in MetaboAnalyst, which is a website for the analysis of metabolomics data. The result is quite different and makes much more sense. I used a 95% confidence regions for both approaches.

I am uploading the dataset and the output from each methodology. Any ideas of what I might be doing wrong?

Data-test_PCA.txt PCAtools_test1.pdf Metaboanalyst_pca_score2d.pdf

Cheers, Luiz

kevinblighe commented 3 years ago

Hey Luiz, yes, the ellipse functionality needs some modification. In this case, can you extend the x-axis limits to see if that results in the ellipses appearing properly?

luizalmeida93 commented 3 years ago

Hey! Thank you for the suggestion. Indeed, extending the x-axis does make an ellipse appear, but it doesn't match the one from MetaboAnalyst. Also, it seems that PCAtools require at least 4 data points, is that correct? Since MetaboAnalyst can plot an ellipse with only 3 points, and their ellipse looks different than yours, does that mean that their mathematical approach is different? PCAtools_test2.pdf

kevinblighe commented 3 years ago

I am not familliar with the MetaboAnalyst PCA functions. In PCAtools, we simply re-use stat_ellipse() from ggplot2:

Where are you running MetaboAnalyst?

luizalmeida93 commented 3 years ago

Awesome, thanks for the clarification, I appreciate it.

I am running MetaboAnalyst on their website, but they do have an R version as well.

I was searching their codes and it seems to be a bit different than yours. They use the prcomp package to obtain the PCA and then select the points for the ellipse. You can check the function here: https://github.com/xia-lab/MetaboAnalystR/blob/master/R/stats_chemometrics.R#L183

Do you think it is more robust? Maybe it could be an option for PCAtools

kevinblighe commented 3 years ago

Thanks, that is quite interesting. They are using the ellipse package, and not ggplot2's ellipse functionality. Based on your evidence, it seems to perform better, and ggplot2's function has been making me uncomfortable for some time.

I will have to explore this, but my time is very limited, and there is no funding for this work (never was).

I note that they, nevertheless, seem to be performing PCA in the same way?

Thanks for passing back this information.

kevinblighe commented 3 years ago

Hi again, it seems that an ellipse is never just an ellipse, and that there are many ways to calculate this. For PCAtools, I had just been using the default as per ggplot2::stat_ellipse.

I have just updated the code to permit that the user selects between a 't', 'norm', and 'euclid' ellipse - see https://ggplot2.tidyverse.org/reference/stat_ellipse.html

I have also made the package more intelligently determine values for xlim and ylim so that, in most cases, the ellipse will render correctly.

These changes are on GitHub, and will appear on Bioconductor release branch when next Bioconductor version is released.

Thanks