zdebruine / RcppML

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
GNU General Public License v2.0
89 stars 15 forks source link

Factor loadings dominated by mitochondrial and ribosomal genes #35

Closed mdmanurung closed 1 year ago

mdmanurung commented 1 year ago

Thank you for writing the blazing-fast package!

I have a more of a practical question regarding interpretation of the NMF results. So, I saw that my factor loadings are mostly dominated by mitochondrial and ribosomal genes. I have made sure that the cells are of good quality, so I am not sure how to interpret the results. It seems like the factors are picking up the highly expressed genes. Would it make sense to remove those genes prior to NMF?

Apologies in advance if I am asking on the wrong venue.

Regards, Mikhael

zdebruine commented 1 year ago

Yes, you want to ensure that interesting signal contributes to the majority of the NNLS objective. You can do this two ways:

  1. Perform some sort of row-wise normalization (just as you perform column-wise normalization). There are some good ideas about how to do this in the Bioconductor deseq2 package.
  2. Remove features that you just aren't interested in (i.e. mtRNA and rRNA), and any features that have overwhelmingly high counts.

Note that it is NOT a good idea to select only variable features -- you will lose a lot of interesting information and statistical power.