jazoller96 / mammalian-methyl-clocks

Coefficients and age transformations for many mammalian and other intrinsic epigenetic clocks, including directions for applying and citing them
Other
6 stars 0 forks source link

How to handle missing clock data? #2

Open JoWhi opened 1 year ago

JoWhi commented 1 year ago

Is there a validated method for handling missing data points with the universal clocks? My beta matrix has NAs for some or all individuals at many sites needed for the clocks. Should these be replaced with a tiny beta value, or 0.5, or ?? It is possible to impute betas for sites at which some individuals have values, but not at sites with no data. Thanks for suggestions!

jazoller96 commented 1 year ago

This method for imputing missing Horvath 40k methylation data has only been done with Mouse, where the lab has a "gold standard" matrix with filler values for each CpG site, based on mean methylation across all of our mice data. However, this situation only arises when the user generated data was generated using a different array, like the standard 320k array or the EPIC array. Like in my previous comment, I would ask if you used the Horvath 40k array to generate your data?

JoWhi commented 1 year ago

My data is from the mammalian array; many sites have NAs in the raw data, and many more sites are dropped after cleaning & normalizing. Is it not possible to get estimates for clocks with missing data with the predictAge function, other than for mouse?

jazoller96 commented 1 year ago

As of now, no, the clocks are built with complete data, but you may try replacing NAs with 0.5 after normalization is done, because I believe that is what Horvath has done in some cases.