outbreak-info / outbreak.info

During outbreaks of emerging diseases such as COVID-19, efficiently collecting, sharing, and integrating data is critical to scientific research. outbreak.info is a resource to aggregate all this information into a single location.
https://outbreak.info/
GNU General Public License v3.0
33 stars 13 forks source link

Switch internal representation of substitutions/deletions to nucleotides #537

Open flaneuse opened 2 years ago

flaneuse commented 2 years ago

Right now, frame-shifting deletions are ambiguous since the lookup for deletions is based on amino acid coordinates. As a result, if there happens to be a deletion of an entire amino acid with the same coordinates, the handlers will combine the frame-shifting deletion and the whole amino acid deletion into the same set of sequences.

To fix, will need to:

  1. Add an option on the front-end to specify deletions based on nucleotide number
  2. Pass these options back to the API
  3. Adjust handlers to translate amino acid-based coordinates to nucleotides, and filter sequences based on nucleotide coords

Consider fixing with the other changes that are refactoring the handlers:

In theory, we would want to combine a set of lineages with a given set of mutations (substitutions/deletions/insertions), specified in amino acid or nucleotide coordinates.