zdebruine / singlet

Single-cell analysis with non-negative matrix factorization
42 stars 13 forks source link

Fixed k for RunNMF #13

Closed ludvigla closed 2 years ago

ludvigla commented 2 years ago

If I understand it correctly, RunNMF runs automatic rank determination for length(k) == 1 and cross_validation for length(k) > 1. However, in cases where RunNMF is meant to be used as a substitute for PCA (e.g. for UMAP embedding or clustering), wouldn't it make sense to have an option to skip cross-validation/rank determination entirely and run the NMF with a fixed k? Regardless, I think it would be nice to have this option.

Cheers, Ludvig

zdebruine commented 2 years ago

Sure, we can add support for fixed-rank factorization.

Your question makes me think that it might be useful for us to make more clear that the optimal rank of NMF does not necessarily coincide with your choice of rank for PCA. You are choosing the rank from PCA by eyeballing the inflection point in a scree plot, which is highly subjective, compared to the cross-validation plot from NMF, which is based on statistical optimization relevant to the objective (mean squared error of reconstruction of masked test values).

zdebruine commented 2 years ago

Support added with 61734a41cdb2a2c1dd6cf89330f3bccca83ba6e8.

Docs for RunNMF.Seurat now describe k as "either NULL for automatic rank determination, a single integer giving the desired rank, or a vector of ranks to use for cross-validation."