nextstrain / dengue

Nextstrain build for dengue virus
https://nextstrain.org/dengue
8 stars 10 forks source link

Generate gene reference files #47

Closed j23414 closed 2 months ago

j23414 commented 2 months ago

Description of proposed changes

In order to support gene phylogenetic trees (e.g. E gene trees), add rules to automatically generate gene reference GenBank and FASTA files (e.g. reference_denv4_E.gb and reference_denv4_E.fasta) by following the rules used in RSV.

This is part of a larger and older issue of creating E gene builds and is being split out into smaller PRs to maintain QC and scope of review. This will not generate an E gene phylogenetic tree, subsequent PRs will modify this to generate the trees.

Visual summary (view whole pipeline plan so far)

Related issue(s)

Checklist

nextstrain build phylogenetic results/config/reference_all_E.gb results/config/reference_all_E.fasta
nextstrain build phylogenetic results/config/reference_denv1_E.gb results/config/reference_denv1_E.fasta
nextstrain build phylogenetic results/config/reference_denv2_E.gb results/config/reference_denv2_E.fasta
nextstrain build phylogenetic results/config/reference_denv3_E.gb results/config/reference_denv3_E.fasta
nextstrain build phylogenetic results/config/reference_denv4_E.gb results/config/reference_denv4_E.fasta
Example shortened reference_denv2_E.gb ``` LOCUS DENV2/THAILAND/REFERENCE/1964 1485 bp DNA UNK 01-JAN-1980 DEFINITION Dengue virus 2, complete genome. ACCESSION NC_001474 VERSION NC_001474.2 KEYWORDS . SOURCE . ORGANISM . . FEATURES Location/Qualifiers CDS 1..1485 /gene="E" /db_xref="VBRC:35921" /product="envelope protein E" /protein_id="NP_739583.2" source 1..1485 /collection_date="1964" /country="Thailand" /db_xref="taxon:11060" /mol_type="genomic RNA" /organism="Dengue virus 2" /strain="16681" ORIGIN 1 atgcgttgca taggaatgtc aaatagagac tttgtggaag gggtttcagg aggaagctgg 61 gttgacatag tcttagaaca tggaagctgt gtgacgacga tggcaaaaaa caaaccaaca 121 ttggattttg aactgataaa aacagaagcc aaacagcctg ccaccctaag gaagtactgt ... 1381 gtcattatca catggatagg aatgaattca cgcagcacct cactgtctgt gacactagta 1441 ttggtgggaa ttgtgacact gtatttggga gtcatggtgc aggcc // ```
j23414 commented 2 months ago

I was wondering why the CI was taking so long, then remembered that example files gets connected to "phylogenetic/data"

https://github.com/nextstrain/.github/blob/4f41fa6db826dff3f1eb09f8d2e0a1512c9e358d/.github/workflows/pathogen-repo-ci.yaml#L236-L237

Fixed with: https://github.com/nextstrain/dengue/pull/47/commits/30b1d5a4860c1822b16f8d55b3e9e577455138e5 CI seems much faster