rki-mf1 / covsonar

A database-driven system for handling genomic sequences of SARS-CoV-2 and screening genomic profiles.
GNU General Public License v3.0
6 stars 0 forks source link

Switch to nextstrain SARS-CoV-2 gene annotation as default #120

Open matthuska opened 1 year ago

matthuska commented 1 year ago

Nextstrain (+ maybe others?) uses an SC2 annotation where CDSs for ORF1a and ORF1b are completely non-overlapping (see https://github.com/nf-core/viralrecon/issues/263). In detail:

##gff-version 3
##sequence-region MN908947 1 29903
# Gene map (genome annotation) of SARS-CoV-2 in GFF format.
# For gene map purpses we only need some of the columns. We substitute unused values with "." as per GFF spec.
# See GFF format reference at https://www.ensembl.org/info/website/upload/gff.html
# seqname   source  feature start   end score   strand  frame   attribute
MN908947    GenBank gene    266 13468   .   +   .   gene_name=ORF1a
MN908947    GenBank gene    13468   21555   .   +   .   gene_name=ORF1b
MN908947    GenBank gene    25393   26220   .   +   .   gene_name=ORF3a
MN908947    GenBank gene    21563   25384   .   +   .   gene_name=S
MN908947    GenBank gene    26245   26472   .   +   .   gene_name=E
MN908947    GenBank gene    26523   27191   .   +   .   gene_name=M
MN908947    GenBank gene    27202   27387   .   +   .   gene_name=ORF6
MN908947    GenBank gene    27394   27759   .   +   .   gene_name=ORF7a
MN908947    GenBank gene    27756   27887   .   +   .   gene_name=ORF7b
MN908947    GenBank gene    27894   28259   .   +   .   gene_name=ORF8
MN908947    GenBank gene    28274   29533   .   +   .   gene_name=N
MN908947    GenBank gene    28284   28577   .   +   .   gene_name=ORF9b

We might want to switch to using the same annotation in order to align our AA mutation profiles with theirs, to make communication within the community easier.