Open flying-sheep opened 5 years ago
Looks great so far! Old project this and sadly lacking unit-tests but will try it out over weekend.
It's very long time since I worked with sparse data but I guess those that do have good tools for doing so already and so wonder if making prep
sparse-friendly really adds value to anyone(?) I like your current solution
These days there’s a lot of sparse single cell transcriptomics data, since current methods both produces huge amounts of data (e.g. 20k genes × 100k cells) but suffers from a lot of dropout (0 instead of small values).
Using PCA as a preprocessing step speeds up things and saves memory – if the PCA method can handle sparse data, that is.
After looking at this more carefully I note this is more complicated than it might first seem. Calling prep
like you suggested isn't good since the center and scale vectors are used later but are then not returned by prcomp_irlba
. I fixed that (not entirely sure it's sparse aware but done the same way that irlba does it) in my irlba
branch https://github.com/hredestig/pcaMethods/tree/irlba. But then realize that also fitted
and predict
must be made sparse aware :/
Wanna have look at that?
Needs docs and a decision if this is the way to proceed or if we need to make
prep
sparse-friendly.Fixes #7