rsinghlab / AltumAge

BSD 3-Clause "New" or "Revised" License
30 stars 9 forks source link

KeyError from missing CpGs in example code #2

Closed joepalermo closed 3 years ago

joepalermo commented 3 years ago

Hi!

I fixed 2 small bugs in example.ipynb: 1 - Path error to .h5 file 2 - Some Horvath CpGs were dropped from the dataframe before running inference for AltumAge

Regarding 2) I got a KeyError error on the line that call's Horvath's model. It seems the following CpGs were not in the dataframe: ['cg02654291', 'cg02972551', 'cg09785172', 'cg09869858', 'cg13682722', 'cg16494477', 'cg17408647', 'cg19273182', 'cg19945840', 'cg27319898', 'cg04431054', 'cg05590257', 'cg06117855', 'cg19046959', 'cg19569684', 'cg24471894', 'cg27016307'].

Screen Shot 2021-09-05 at 12 15 28 PM

I've also added a .gitignore file which is standard practice to avoid committing undesirable files (i.e. notebook checkpoints, virtual env contents, etc...)

lcamillo commented 3 years ago

Hi!

Thank you so much for spotting these bugs!

With regards to the missing CpG sites, Horvath's model uses 353 CpG sites common to both the Illumina 27k and 450k platforms whereas AltumAge uses 20318 at the intersection of Illumina 27k, 450k, and EPIC. Those missing are therefore not present in the EPIC array. You are correct that we should not have filtered the CpG sites before Horvath's model example.

We really appreciate it!