nextstrain / fauna

RethinkDB database to support real-time virus analysis
GNU Affero General Public License v3.0
33 stars 13 forks source link

feat: BV-BRC support #128

Closed j23414 closed 2 months ago

j23414 commented 1 year ago

Context

ViPR is being replaced with BV-BRC with final transfer possibly occurring by the end of the year (2022-12-31) which might disrupt pathogen builds.

BV-BRC_screenshot

Description

For any pathogen builds depending on ViPR (such as zika), we should make sure we can do one (or a combination) of the following:

Possible solution

So far the user interface seems pretty equivalent:

BV-BRC_zika BV-BRC_zika_download

However we may need to modify the fasta headers:

Notice how the header field is different from ViPR and not modifiable like ViPR. (example from dengue)

head BVBRC_genome_sequence.fasta
>accn|KY829115   Dengue virus 1 isolate H.sapiens-wt/BLM/2016/MA-WGS16-006-SER, complete genome.   [Dengue virus 1 H.sapiens-wt/BLM/2016/MA-WGS16-006-SER strain Dengue virus 1/H.sapiens-wt/BLM/2016/MA-WGS16-006-SER | 11053.9479]
agttgttagtctacgtggaccgacaagaacagtttcgaatcggaagcttgcttaacgtag
ttctaacagttttttattagagagcagatctctgatgaacaaccaacggaaaaagacggg
tcgaccgtctttcaatatgctgaaacgcgcgagaaaccgcgtgtcaactggttcacagtt

Instead we may need to download and process the tabular data to match our current GenomicFastaResults.fasta headers:

example from Zika

less GenomicFastaResults.fasta
>KY241742|ZIKV_SG_072|NA|2016_08_28|Human|Singapore|Asian|Zika_virus
GAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTTATTTTGGATT
TGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCCGGATTGTCAATATGC
TAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGCCGGACTTCTGCTGGG
TCATGGGCCCATCAGGATGGTCTTGGCGATTCTAGCCTTTTTGAGGTTCACGGCAATCAAGCCATCACTG

Alternatively, someone could do a deep dive into BV-BRC cli

j23414 commented 1 year ago

Just a note that ViPR website redirects to BV-BRC. Ergo any ViPR instructions may be out of date

j23414 commented 2 months ago

Closing this, since instead of adding BV-BRC support, we've gone in the direction of using Entrez or NCBI datasets in the pathogen repo-guide.