Open jeinson opened 6 years ago
Especially for annotation from ENCODE like chromatin states and TFBS, there are many NAs. In those cases, we used a minimum number (0), which is background. This is also what CADD used in their variant feature imputations.
When generating a matrix of features for RIVER, how do the developers handle situations where no variant near a particular gene has a CADD annotation for features like TFBS or EncOCCombPVal? glmnet cannot handle NAs, but n my dataset 95% of genes have at least one missing feature annotation, so removing such cases would waste most of the data.