During outbreaks of emerging diseases such as COVID-19, efficiently collecting, sharing, and integrating data is critical to scientific research. outbreak.info is a resource to aggregate all this information into a single location.
Right now, frame-shifting deletions are ambiguous since the lookup for deletions is based on amino acid coordinates. As a result, if there happens to be a deletion of an entire amino acid with the same coordinates, the handlers will combine the frame-shifting deletion and the whole amino acid deletion into the same set of sequences.
To fix, will need to:
Add an option on the front-end to specify deletions based on nucleotide number
Pass these options back to the API
Adjust handlers to translate amino acid-based coordinates to nucleotides, and filter sequences based on nucleotide coords
Consider fixing with the other changes that are refactoring the handlers:
addition of insertions.
AND/OR/NOT/ANY query processing.
In theory, we would want to combine a set of lineages with a given set of mutations (substitutions/deletions/insertions), specified in amino acid or nucleotide coordinates.
Right now, frame-shifting deletions are ambiguous since the lookup for deletions is based on amino acid coordinates. As a result, if there happens to be a deletion of an entire amino acid with the same coordinates, the handlers will combine the frame-shifting deletion and the whole amino acid deletion into the same set of sequences.
To fix, will need to:
Consider fixing with the other changes that are refactoring the handlers:
In theory, we would want to combine a set of lineages with a given set of mutations (substitutions/deletions/insertions), specified in amino acid or nucleotide coordinates.