willtownes / glmpca-py

generalized principal component analysis (GLM-PCA) implemented in python
GNU Lesser General Public License v3.0
57 stars 7 forks source link

Feature request: Sparsity support #12

Open jlause opened 4 years ago

jlause commented 4 years ago

Hey Will,

in the R package, sparse matrices are supported under some circumstances. As far as I saw, the python version does not make use of sparsity yet - do you plan to add in a later version? I'd be very interested in this feature for larger datasets - what are the main challenges here?

Looking forward & thanks!

Jan

willtownes commented 4 years ago

Hi Jan, thanks for your interest. Yes that is on the to-do list. Basically all the changes in v0.2 of the R package I want to port over to python but it will take some time. The tricky part about sparse matrices is I can't do the vanilla full-data gradient without instantiating dense matrices of the same size as the data. Instead, it's necessary either to use memoization or stochastic gradient methods with minibatches to conserve memory.

willtownes commented 4 years ago

Note to self- this will probably depend on first addressing #1 , since the CSR (row-oriented sparsity) format is more standard in the python world, whereas CSC is more standard in the R world (eg Matrix package).