sinhrks / ggfortify

Define fortify and autoplot functions to allow ggplot2 to handle some popular R packages.
Other
525 stars 65 forks source link

Loadings cutoff option (PCA) #212

Open cdiazmun opened 3 years ago

cdiazmun commented 3 years ago

Hello!

First, thank you for developing the package, it has been very useful.

I actually open an issue to request (if possible) a new feature at plotting the factor loadings in a PCA. There are already nice aesthetic options for the loadings. However, I would be interested on setting a loadings.cutoff option to select the desired ones. When working with PCAs based on many variables (50 in my case) it can become very messy even when playing with sizes and all. Furthermore, there are some factors that I may not be interested on, because they don't explain any variance in the samples, so it's also a nice feature to filter-out some factors.

Thank you in advance.

Regards, Cristian

terrytangyuan commented 3 years ago

Could you give an example use of the loadings.cutoff option that you are proposing? Would you like to submit a pull request? The related code is currently in this file: https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R

cdiazmun commented 3 years ago

Following the example you use to illustrate your package:

autoplot(prcomp(df), data = iris, colour = "Species",
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)

If you do: print(prcomp(df)) you get a list with the loadings for the PCA list:

Standard deviations (1, .., p=4): [1] 2.0562689 0.4926162 0.2796596 0.1543862

Rotation (n x k) = (4 x 4): PC1 PC2 PC3 PC4 Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872 Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231 Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390 Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574

Then with a cutoff option, you could select those above a threshold [absolute 0.7 for instance (to take loadings above 0.7 or below -0.7)] in PC1 and PC2, which are the ones you want to plot:

autoplot(prcomp(df), data = iris, colour = "Species",
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3,
loadings.cutoff = 0.7)

Then in the final plot you would only see Sepal.Width and Petal.Length.

terrytangyuan commented 3 years ago

Thank you! This looks very useful indeed. Would you like to submit changes to support this feature?

cdiazmun commented 3 years ago

I'm very sorry, but I'm not very familiar with GitHub, so I actually don't know how to do that. And neither how to submit a pull request, although I have the feeling is the same thing haha. I will read the guide I try it soon.

terrytangyuan commented 3 years ago

Okay great. I won't have time to get to this soon so feel free to give it a try!

BioinfGuru commented 1 month ago

Hi @terrytangyuan, has anyone made progress in implementing this yet (or a workaround) ? I'd certainly be interested as a side project.

terrytangyuan commented 1 month ago

Nope. Go ahead