Closed Katterinne closed 4 years ago
Dear Zhang,
Thank you for the tool (it works and it's user-friendly, unlike DNAmAge).
This is to report what I think may be a bug in the transformation of missing data to the mean value when feeding the tool with an input with NAs in it.
Here is what happened to me: I have methylation data including several CpGs with NAs and when I run DNAm-based-age-predictor I get back the EN prediction, but the "blupred" column comes with NAs only. Of course, maybe I'm making a mistake here, I'm not sure. I hope you can help me. Please, let me walk you through it and provide my input data for you...
First, I created the required input for DNAm-based-age-predictor (RDS file) from my table containing the DNA methylation values at all CpG sites in all my samples.
In R:
# read DNA methylation data data <- as.matrix(t(read.table(file = "Nexs_methyl.tsv", header = TRUE, sep = "\t", row.names = 1, as.is=TRUE))) # export R object saveRDS(data, file = "Nexs.rds")
Once having the RDS file ready, I ran the tool:
In the terminal:
$ Rscript pred.R -i Nexs.rds -o Nexs_age.pred -a Nexs.age [1] "1. Data loading and QC" [1] "1.1 Reading the data" [1] "1.2 Replacing missing values with mean value" [1] "1.3 Standardizing" [1] "2. Loading predictors" [1] "3. Checking misssing probes" [1] "0 probe(s) in Elastic Net predictor is(are) not in the data" [1] "0 probe(s) in BLUP predictor is(are) not in the data" [1] "BLUP can perform better if the number of missing probes is too large!" [1] "4. Predicting" [1] "Completed!!!"
$ cat Nexs_age.pred ID age enpred blupred Nex10 0 23.5517451927356 NA Nex12 0 27.16853446086 NA Nex18 0 26.9359837106917 NA Nex6 0 36.3717094129017 NA Nex8 0 22.8164441953937 NA
Using this Dropbox link you can download a zip file containing all the files mentioned above:
Nexs_methyl.tsv
,Nexs.rds
,Nexs.age
, andNexs_age.pred
.Thank you in advance!
Regards, Katterinne
Hi Katterinne,
As you may have noticed, I did not check Github regularly, sorry for my late comment!
For your problem, since there are only 5 samples in your data file, some of the probes have NA value across all samples. Under such condition, the "replace NA" process did not work since it was designed to use the average value of DNA methylation across samples to replace the NA. And the NA value in the data would make the matrix multiplication did not work.
I have now updated the script to detect such probes and then remove them. Considering probes like this are less (especially when the sample size is large), I think removing them will not affect too much on the chronological age prediction.
Cheers, Qian
Hi Qian,
No problem at all, on the contrary, thank you very much for answering !! And I'm sorry for my late reply, I was out for a couple of weeks.
Well, that makes a lot of sense. Shame on me for not noticing it myself! Thanks a lot for the script update, works without problem now :D
Cheers, Katterinne
Dear Zhang,
Thank you for the tool (it works and it's user-friendly, unlike DNAmAge).
This is to report what I think may be a bug in the transformation of missing data to the mean value when feeding the tool with an input with NAs in it.
Here is what happened to me: I have methylation data including several CpGs with NAs and when I run DNAm-based-age-predictor I get back the EN prediction, but the "blupred" column comes with NAs only. Of course, maybe I'm making a mistake here, I'm not sure. I hope you can help me. Please, let me walk you through it and provide my input data for you...
First, I created the required input for DNAm-based-age-predictor (RDS file) from my table containing the DNA methylation values at all CpG sites in all my samples.
In R:
Once having the RDS file ready, I ran the tool:
In the terminal:
Using this Dropbox link you can download a zip file containing all the files mentioned above:
Nexs_methyl.tsv
,Nexs.rds
,Nexs.age
, andNexs_age.pred
.Thank you in advance!
Regards, Katterinne