jdwinkler / resistome_generator

Development repo for the Resistome database and its tooling.
2 stars 0 forks source link

Feature modules? #20

Open jdwinkler opened 3 years ago

jdwinkler commented 3 years ago

Some ways to push the Resistome closer to providing actual genotype <=> phenotype predictions:

  1. Genotype to property changes would be interesting. For example, there are many studies looking at cell shape, DNA conformation changes, regulatory changes, etc that are not represented in the Resistome explicitly. These data are valuable, and could provide additional predictive power once genotype => phenotype models are created. Finding a way to store or model these data would be a big improvement.
  2. Trying to tie residue changes to domains + structural impact rather than the exact point mutation; this would allow better generalization, especially if we are interested in other organisms. DeMaSk and SNAP2 may be enough for the structural impact, uniprot for the domains (possibly).

It may be more effective to pose the problem as possible:

Given the raw genotype obtained from a selection study (ALE, library, random mutagenesis, targeted searches, etc), can we assemble a feature vector Vf such that a model M can predict the phenotype vector Vp, where Vpi = P(i | M (Vf))?

I was originally trying to use the raw genotype directly, but I now think that was wrong-it would probably be better either transform the genotypes into more generalized features and include higher level info (especially non-gene hits, the properties above, metabolic preds, thermodynamic info, etc), followed by NN development. Ideally we can create a multi-class predictor. It would also be good to generalize the phenotypes as well to avoid overly specific predictions.