Open ginnyintifa opened 1 year ago
impute_mle2()
function (see #100). I'll update the documentation, as I now see that it isn't explicitly mentioned in the MLE imputation paragraph.MARGIN == 2
imputes along the columns. If you want to impute along the features, you need to set it to 1. If you see a different behaviour, it's a bug and please do let me know. The discussion about the margins is actually more involved, I think, and will also depend on downstream applications.By the way, if you are processing quantitative proteomics data, I highly advise to consider giving the QFeatures package a go.
@lgatto Is there any recent change of MLE? We are actually in a class using imputation from MSnbase. What we noticed is that it looks like something change from versions and the data takes forever to be imputed using MLE
.
@hsiaoyi0504 - there have been changes in the past, such as adding support for the norm2 package (about 2 years ago), and then dropping it again last year because it was removed from CRAN. About 2 years ago, we also added a MARGIN
argument that defines if rows or columns-wise imputation should be done.
Dear Team,
MLE is one of the imputation options, which calls the
em.norm
andimp.norm
functions from thenorm
package. And implemented by Margin ==2 .I think Margin ==2 is a reasonable setting since the p*n original data matrix (features in rows and samples in columns) would be transposed before sending to the EM algorithm. Therefore when doing EM each feature would be the actual genes/proteins/peptides.
But the issue is proteomics data is always p>>n. We would have ~20000 proteins and a dozen of samples in TMT global proteome data set for example. Then with as good number of features, EM algorithm is so expensive.
I am trying this data set (10k * 24) with the impute_mle function and haven't got any results yet.
Do you have any insights on this issue?
Thank you very much!