I am with Chris Jarvis and our new data manager Luke in Geneva. We have noticed that the cleaned Master Linelist age variable is missing for all patients < 1 year. Our investigation of the cleaning scripts showed that it sources from scientific notation in the raw Excel import.
For children under one, formulas are being used such as =36/365 which then comes up as 9.8E-2. When it’s read into R in aaa_clean_linelist.rmd on line 110 the clean_data function replaces the 9.8E-2 with 9_8E_2, which then becomes a missing value, as on line 366 where age is changed to be numeric. The two underscores are replaced to a . using gsub. This means we miss the age of these children who are under one year.
From Neale Batra: