kevinblighe / PCAtools

PCAtools: everything Principal Components Analysis
329 stars 67 forks source link

Using stat_ellipse to indicate clustering results #60

Closed SebastianHesse closed 1 year ago

SebastianHesse commented 2 years ago

Dear Kevin,

can you think of a way how we could add ellipses around a different set of metadata in the PCA? EG: We have a PCA where different disease genotypes are indicated by colby = "genotype". Before, we did run a kmeans analysis and would like to show cluster assignments in the PCA by ellipses. Unfortunately, using "encircle" only allows to encircle the genotypes already assigned to color. Is there a way we could use eg + stat_elipe with its own aes() to encircle the points according to a different meta data?

Thanks a lot! Sebastian

kevinblighe commented 1 year ago

Hi Sebastian, could you add the kmeans output to your metadata that is imported to PCAtools, and then go that route in order to generate the ellipse? There is currently no way to add ellipses to metadata variables that are outside of the main PCA object that PCAtools creates.

SebastianHesse commented 1 year ago

Hi Kevin, indeed I added the info to the metadata, thanks.

But the question is if it would be possible to use other data for coloring and elipse as currently its the same and hence a bit redundant.

It would be great if I could color the samples by eg genotype and draw a elipse around eg cluster groups. This way I could show eg that different genotypes cluster together which would allow to show another layer of information.

It would probably require another input option, eg: col_by = "genotype, elipse_by = cluster.

That would be absolutely amazing, thanks a lot! Sebastian

kevinblighe commented 1 year ago

Hi Sebastian, I see what you mean. Unfortunately, that is not yet possible and it would require a major modification to the code.

You can possibly try to achieve it via manual 'add-on' functions to PCAtools::biplot. For example, the encircle function comes from ggalt::geom_encircle(...) (https://github.com/kevinblighe/PCAtools/blob/master/R/biplot.R#L612-L658), while ellipse comes from ggplot::stat_ellipse(...)(https://github.com/kevinblighe/PCAtools/blob/master/R/biplot.R#L671-L744)

So, for example:

biplot(...) + ggalt::geom_encircle(...)
biplot(...) + ggplot::stat_ellipse(...)
SebastianHesse commented 1 year ago

This worked perfectly, thanks so much!

I simply added + ggalt::geom_encircle( aes(group = pca_all$metadata$k4_vsn), #meta data I want to encircle

colour = encircleLineCol,

        fill = NA,
        #alpha = encircleAlpha,
        #size = encircleLineSize,
        show.legend = FALSE,
        na.rm = TRUE)

Works well, I will play a bit with the graphic specifics but now I can encircle another metadata than specified by col!

Thanks so much!

kevinblighe commented 1 year ago

Wow! That's fantastic. Yes, I designed this package so that advanced users can 'add on' functionality. Thanks for using this package.