zdebruine / RcppML

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
GNU General Public License v2.0
89 stars 15 forks source link

crossValidate aborting R #26

Closed yannk-lm closed 2 years ago

yannk-lm commented 2 years ago

Hello,

Thank you for developping this incredibly fast nmf package. I am using RcppML version 0.5.2.

Firstly, when I want to run the crossValidate function with method = "impute" with a 36591x1098 dgCMatrix of scRNAseq data as input, my R session abort almost instantly. I don't have any issue with the 2 others methods.

Secondly, since I am missing the plot generated by the imputation method for my data and even though I read your blog post "Cross-validation for NMF rank determination", I am still confused on how to choose k when the Bi-cross-validation (method = "predict") and the Robustness (method = "robust") return results similar to the one found in the hawaiibirds dataset.

Regards, Yannick

zdebruine commented 2 years ago

@yannk-lm Thanks, and thank you for raising this issue. It's a feature that I am working on actively, and I'll bet there's something to fix here.

  1. I would love to reproduce your issue, especially because I had similar issues early in development of this method (which was just 1-2 weeks ago). But I would like to ask a few things:
    • if you install the (very latest) development version of RcppML right now (v0.5.3), can you reproduce your issue? Sorry to bump the version this morning, but I made a related fix yesterday without updating the version count.
    • does your data contain any NA values?
    • any zero-sum rows or columns?

If none of the above apply, is there any way you can find a reproducible example or share the data with me so I can take a look? Also, what is your value of n and are there any parameters in the ... argument?

  1. Bi-cross-validation and cost of bipartite matching are very experimental methods. They are both documented in the academic literature, but I'm having limited success with them. Missing value imputation, on the other hand, works well across a variety of datasets. This week and next I will be improving crossValidate, systematically assessing these methods, completing that blog post of which you speak, and writing a vignette on cross-validation for a pkgdown site in the works. You'll then have more to work from!
yannk-lm commented 2 years ago

@zdebruine Thank you for answering in such short notice.

  1. I did not have more success with the v0.5.3. I sent you a mail with the data causing this issue. I'm hoping this will help

  2. Thank you for your dedication, i am eager to see the result of your work.

Regards, Yannick

zdebruine commented 2 years ago

Yannick,

This issue is fixed in v0.5.3.1. Your data contained zero-valued rows, and the iterator scheme in masked NMF was not set up to handle that corner case. It is now 👍

Thanks again for raising this, it was helpful!

Zach