Closed mccoymd closed 1 year ago
When it comes to labeling the lower dimensional projections of variant proteins, the evidence codes be used to establish the decision boundaries between classification models.
These classifications can be used for training and validating predictive models of variant impact on protein function.
Matthew provided a summary of this discussion.
Resolving in preparation for the 2023 hackathon/jamboree.
Given the scale of potential variation, to what extent can protein-specific models of functional impact be used as evidence?
I've been developing a method to predict functional changes of missense mutations using simulations of protein structure.
https://www.cell.com/biophysj/fulltext/S0006-3495(20)33203-3
In short, molecular dynamics (MD) simulations are used to define the characteristic "wobble" using an all atom representation of a variant's protein structure. By projecting this variant specific profiles into a lower dimensional space (PCA), we find that variants cluster by their contribution to divergent disease mechanisms, which allows us to define a decision boundary for a classification model. That is to say, we can use these projections of variant dynamics to train an AI classifier that predicts the impact of novel variants based on the MD results.
An interesting result came when we applied this method to predict venetoclax resistance in BCL2. Shown below is a projection of resistant variants (red), sensitive variants (blue), and a re-sensitized double variant (rescue - first G101V, second E152A) into PCA space. The "rescue" variant becomes more wildtype, as also shown with the predicted Ki.
https://www.sciencedirect.com/science/article/pii/S0010482521008544?via%3Dihub
The Ki's were predicted by quantifying the distance of a given variant from the wildtype, and our preliminary data suggests the degree of divergence from the wildtype dynamics correlates to the degree of functional change in other proteins as well (see the Biophysical Journal article linked above).
So... to what degree can these methods be used as evidence? Or on the other hand, to what degree can curated evidence be used to label/train these sorts of models?