perishky / meffil

Efficient algorithms for analyzing DNA methylation data.
Artistic License 2.0
53 stars 28 forks source link

Imputation Issue #10

Closed jlevy44 closed 5 years ago

jlevy44 commented 5 years ago

I've noticed that the pipeline performs mean imputation during normalization. Is there anyway to disable this feature? Or replace it with another function?

I'd like to use k-nn or mice for imputation.

It would also be nice for meffil to output a "missingness report".

perishky commented 5 years ago

Mean imputation is used only when deriving principal components or surrogate variables from the methylation matrix. This is done because neither PCA nor SVA can handle missing values. I doubt that using a more sophisticated imputation method would make much of a difference because the mean value imputation is unlikely to induce variation that would be picked up by either method. Using k-nn and mice for imputation would likely significantly impact running time. That said, this could be added quite easily by adding additional options to the impute.matrix() function. Do you have a code snippet showing how these methods could be applied to a methylation matrix and any idea of the running time?

jlevy44 commented 5 years ago

I actually aim to run imputation using python. As I may have mentioned in previous posts, I convert the beta value matrix into a python pandas dataframe after running these commands:

norm.objects <- meffil.normalize.quantiles(qc.objects, number.pcs=n.pcs, verbose=F) norm <- meffil.normalize.samples(norm.objects, just.beta=F, cpglist.remove=qc.summary$bad.cpgs$name) beta <- meffil.get.beta(norm$M, norm$U) pcs <- meffil.methylation.pcs(beta) norm.summary <- meffil.normalization.summary(norm.objects, pcs=pcs) meffil.normalization.report(norm.summary, output.file=norm.report.fname) return(beta)

Assuming they work, the problem that I have is that the returned beta matrix contains no NA beta values. I read through the repo and saw that the the calculations for the PCs are with the mean imputed values, which I think is great. However, the final matrix returned has no missing values from which I can impute or at least get an estimate of missingness.

jlevy44 commented 5 years ago

In other words: I want to impute the matrix after running the functional normalization, but there is nothing to impute, because the missingness has disappeared for some reason.

jlevy44 commented 5 years ago

In other words: I want to impute the matrix after running the functional normalization, but there is nothing to impute, because the missingness has disappeared for some reason.

jlevy44 commented 5 years ago

I'd expect any element that has failed p value detection and bead numbers to be set to NA when output if the samples or probes are not removed via outlier detection.

jlevy44 commented 5 years ago

I think I was able to get the NA values. I may PR my modifications to add missingness back for remaining CpGs and samples based on beadnum and detection values after outlier removal.

jlevy44 commented 5 years ago

Closing now.