Closed jaybee84 closed 4 years ago
Techniques to manage disparities in data generation are required to power robust analyses in rare diseases: Rarity of patients leads to heterogeneity in sample collection, causing disparities in the data. We will discuss how rigorous normalization and methodologies capturing sample-wise gene-set level information can help appropriate integration of disparate data points to power machine learning approaches11–13.
There's a lot to possibly talk about here, so let's break this down by data type:
adding this paper for consideration in the high-impact mutation prediction point (along with SIFT and Polyphen) ... seems like a good resource for diseases with multigenic possibilities!
(conflict of interest disclaimer: I may have a soft spot for ensemble RFs :P )
An important thing to acknowledge: batches/processing cores/institutions are often confounded by the biological variables, like tumor type, disease state, etc.