nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
292 stars 163 forks source link

ORF1ab is listed in JSON genome_annotations, but not in dropdown for Color by: Genotype #1699

Open AngieHinrichs opened 1 year ago

AngieHinrichs commented 1 year ago

Hi! This might be an Auspice thing, but since I'm using the nextstrain.org/fetch/ function I'll file it here.

Current Behavior

When I view https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/whereIsOrf1AB.json and select Color by: Genotype, the gene menu does not include ORF1ab even though it is the first item in the JSON's genome_annotations list:

... "genome_annotations": { "ORF1ab": { "start": 266, "end": 21555, "strand": "+", "type": "CDS"} , "S": { "start": 21563, "end": 25384, "strand": "+", "type": "CDS"} , "ORF3a": {  ...

The Color by: Genotype gene menu lists nucleotide, S, ORF3a, ...:

image

Expected behavior

I would expect the Color by: Genotype gene menu to list nucleotide, ORF1ab, S, ORF3a, ....

How to reproduce

  1. Go to https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/whereIsOrf1AB.json
  2. Select Color by: Genotype
  3. Try to choose ORF1ab from the gene menu... it's missing.

Possible solution

Is 'nucleotide' perhaps replacing the first element instead of being prepended to the list??

Your environment: if browsing Nextstrain online

Additional context

HT @FedeGueli

joverlee521 commented 1 year ago

Hi @AngieHinrichs,

When I open the console on the page, I see the following error coming from Auspice:

[Genome annotation] ORF1ab has length 21290 which is not a multiple of 3

With the latest Auspice updates made by @jameshadfield, you should be able to define the two segments separately in the genome annotations:

  "ORF1ab": {
    "strand": "+",
    "segments":[
      {"start": 266, "end": 13468},
      {"start": 13468, "end": 21555}
    ]
  },
joverlee521 commented 1 year ago

It would be helpful to make this error more obvious to users by dispatching a warning or error notification.

jameshadfield commented 1 year ago

Hey @AngieHinrichs - @joverlee521's summarisd things perfectly but note that the segmented annotations can't yet be produced by the augur tools so you'll have to add them via a short python script. Here's an example of a python script I used in testing to manipulate the ncov JSONs to produce segmented annotations for the 2 CDSs which cover the slip site (RdRp and ORF1ab). Internally we debated changing all our ncov datasets from separate ORF1a + ORF1b CDSs to the more correct ORF1ab CDS, but I don't think we will do this as so many people (and pango designations) are using ORF1b numbering; we will probably add the 16 proteins cleaved from the polyproteins tho.

AngieHinrichs commented 1 year ago

Ah, thanks @joverlee521 and @jameshadfield! I wish I'd thought to check the console. OK, I will update the ORF1ab coords in the JSON to list the segments. My code has been adding ORF1ab mutation annotations to the nodes so it didn't occur to me that the ORF1ab coords would really matter for anything besides drawing the genes down below. 🙂