nextstrain / dengue

Nextstrain build for dengue virus
https://nextstrain.org/dengue
8 stars 10 forks source link

Word of caution for genome MT597439 #54

Closed Rohit-Satyam closed 2 months ago

Rohit-Satyam commented 2 months ago

Though the header of the genome MT597439 says "Dengue virus type 2 isolate 43257 polyprotein (POLY) gene, partial cds; and sfRNA2 lncRNA gene, partial sequence", the serotype section of this genome tag it as

FEATURES Location/Qualifiers source 1..10252 /organism="dengue virus type 2" /mol_type="genomic RNA" /serotype="4" /isolate="43257" /isolation_source="serum" /host="Homo sapiens" /db_xref="taxon:11060" /country="South Korea" /collection_date="2010" /note="genotype: 2"

These people messed up while submission of this genome. In their Article here, they correctly assign it as DENV4/II (See Fig 1b, sample 43257 highlighted in yellow). I will request NCBI to correct this entry. But wanted to highlight it for the record.

Rohit-Satyam commented 2 months ago

Kindly correct the ncbi_serotype section for this genome in https://data.nextstrain.org/files/workflows/dengue/metadata_all.tsv.zst

j23414 commented 2 months ago

Thanks @Rohit-Satyam! I've PR'd a fix (https://github.com/nextstrain/dengue/pull/55) and the metadata should also reflect the fix now:

wget https://data.nextstrain.org/files/workflows/dengue/metadata_all.tsv.zst
zstd -d metadata_all.tsv.zst
grep "MT597439" metadata_all.tsv 
j23414 commented 2 months ago

Just a heads-up that some of the metadata columns just changed https://github.com/nextstrain/dengue/issues/41#issuecomment-2113248913 and https://github.com/nextstrain/dengue/pull/51

Rohit-Satyam commented 2 months ago

Just a heads-up that some of the metadata columns just changed #41 (comment) and #51

  • ncbi_serotype -> serotype_ncbi
  • nextclade_subtype -> genotype_nextclade

But this is yet to be updated in the metadata right? Coz I still see ncbi_serotype and nextclade_subtype

j23414 commented 2 months ago

Correct, it'll be updated within the next 24 hours, when the next ingest-to-phylogenetic github action runs.

It's set to run once a day (~10am Pacific Time), so should start running in ~40 mins