legaultmarc / cohort-manager

Utility to manage and explore collections of phenotype data.
2 stars 2 forks source link

Hierarchical Exclusions #1

Closed legaultmarc closed 8 years ago

legaultmarc commented 8 years ago

For now, they only work if both the child and the parent are discrete variables. I am not sure about how we should handle the case where the child is another variable type (e.g. continuous). It's not possible to set it as "unaffected" like we do for discrete variable, but it is not a proper missing value either (i.e. it should not be considered when computing the % missing).

legaultmarc commented 8 years ago

For statistical testing, coding unaffected individuals for factors and continuous variables as NA values is not problematic as they would normally be excluded from the analysis. The problem lies in the reporting of summary statistics, where the number of missing values will be inflated if no special consideration is used to identify unaffected individuals. To counteract this problem, two functions “get_number_unaffected” and “get_number_missing” are provided in the public API. These should be used when generating summary statistics.

legaultmarc commented 8 years ago

Fixed in 2dfb47f2ec8ad6b47d68d7763c09a0988f4bfbd5