Closed jlevy44 closed 5 years ago
Mean imputation is used only when deriving principal components or surrogate variables from the methylation matrix. This is done because neither PCA nor SVA can handle missing values. I doubt that using a more sophisticated imputation method would make much of a difference because the mean value imputation is unlikely to induce variation that would be picked up by either method. Using k-nn and mice for imputation would likely significantly impact running time. That said, this could be added quite easily by adding additional options to the impute.matrix() function. Do you have a code snippet showing how these methods could be applied to a methylation matrix and any idea of the running time?
I actually aim to run imputation using python. As I may have mentioned in previous posts, I convert the beta value matrix into a python pandas dataframe after running these commands:
norm.objects <- meffil.normalize.quantiles(qc.objects, number.pcs=n.pcs, verbose=F) norm <- meffil.normalize.samples(norm.objects, just.beta=F, cpglist.remove=qc.summary$bad.cpgs$name) beta <- meffil.get.beta(norm$M, norm$U) pcs <- meffil.methylation.pcs(beta) norm.summary <- meffil.normalization.summary(norm.objects, pcs=pcs) meffil.normalization.report(norm.summary, output.file=norm.report.fname) return(beta)
Assuming they work, the problem that I have is that the returned beta matrix contains no NA beta values. I read through the repo and saw that the the calculations for the PCs are with the mean imputed values, which I think is great. However, the final matrix returned has no missing values from which I can impute or at least get an estimate of missingness.
In other words: I want to impute the matrix after running the functional normalization, but there is nothing to impute, because the missingness has disappeared for some reason.
In other words: I want to impute the matrix after running the functional normalization, but there is nothing to impute, because the missingness has disappeared for some reason.
I'd expect any element that has failed p value detection and bead numbers to be set to NA when output if the samples or probes are not removed via outlier detection.
I think I was able to get the NA values. I may PR my modifications to add missingness back for remaining CpGs and samples based on beadnum and detection values after outlier removal.
Closing now.
I've noticed that the pipeline performs mean imputation during normalization. Is there anyway to disable this feature? Or replace it with another function?
I'd like to use k-nn or mice for imputation.
It would also be nice for meffil to output a "missingness report".