juliasilge / widyr

Widen, process, and re-tidy a dataset
http://juliasilge.github.io/widyr/
Other
327 stars 29 forks source link

How to know the variability explained by individual PCA (widely_svd) #36

Open aalsharef opened 3 years ago

aalsharef commented 3 years ago

Hello,

Thanks for the great package! It is not clear to me how to select the number of PCA inside the function "widely_svd". Can I know the variability explained by individual PCA (i.e., selecting the optimal nv) ? This would justify selecting the number of PCAs. For now, I'm setting it to 100 (nv = 100) following the suggestion in "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge.

Thank you very much!

juliasilge commented 1 year ago

Thanks for your patience with this issue! 🙌

In its current implementation, we only return the u matrix from the SVD:

https://github.com/juliasilge/widyr/blob/a6696d64ec7a21b23196a6024e06e3a937ae2a93/R/widely_svd.R#L77-L89

Let's think through how we might return other, more complete info from the SVD in the tidy format we use in this package. In the meantime, I would recommend that you use a lower-level interface to SVD like irlba so you can get out all the information you want.