nextstrain / rsv

Workflow for RSV analyses on Nextstrain.org
https://nextstrain.org/rsv
6 stars 3 forks source link

Add "--start" and "--end" arguments to newreference.py to allow for creating subgenic trees #58

Open j23414 opened 6 months ago

j23414 commented 6 months ago

Description of proposed changes

Adds optional "--start" and "--end" arguments to provide 0-based start and end positions respective to a "--gene" of interest.

Since the GenBank sequences can contain extra sequences off the end of the polyprotein, the start and end positions are relative to the gene of interest which was deemed more stable behavior.

Example of only pulling out E gene in Dengue (original)

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E

Will generate a reference genbank with features:

FEATURES             Location/Qualifiers
     CDS             1..1485
                     /gene="E"
                     /product="envelope protein E"
                     /protein_id="NP_740317.1"
     source          1..1485
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"

Example of pulling E subgenic region (New Feature)

Run with new start and end region:

python scripts/newreference.py \
  --reference dengue_reference.gb \
  --output-fasta E.fasta \
  --output-genbank E.gb \
  --gene E \
  --start 0 \
  --end 9

Will result in:

LOCUS       DENV4/NA/REFERENCE/2003    9 bp    DNA              UNK 01-JAN-1980
DEFINITION  Dengue virus 4, complete genome.
ACCESSION   NC_002640
VERSION     NC_002640.1
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     source          1..9
                     /clone="rDEN4"
                     /db_xref="taxon:11070"
                     /mol_type="genomic RNA"
                     /organism="Dengue virus 4"
     CDS             1..9
                     /gene="E_0_9"
ORIGIN
        1 atgcgatgc
//

Related issue(s)

Checklist