Open NmnBttr opened 1 year ago
Hey, thanks for any support! Just adding to what @NmnBttr already described: this is a virus genome composed of a single polyprotein. The polyprotein ist then post-translational splitted into mature proteins. But the mature proteins are not labeled as "CDS" in the annotation files.
I think this is a general problem working with polyproteins?
One of our ideas was to modify the annotation in a way to fit the schemes of SNPEff but we were not successful so far.
Thanks!
Database requests
Note: Genome FASTA file might not be needed in some cases (e.g. GenBank files usually have genome sequence information)
Note: Either CDS or Protein FASTA files should be used to ensure correctness (sometimes these sequences are provided in the GenBank files).
Feature requests
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I tried to build the database on my own with the following calls:
In both cases, SnpEff runs without giving any errors. However, the resulting vcf file looks like this:
AY184219.1 191 . T C 18.151 PASS primary_call=T;primary_prob=0.556;ref_prob=0.556;secondary_call=C;secondary_prob=0.429;ANN=C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|AAN85442.1|protein_coding||c.-552T>C|||||552|,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP4|protein_coding||c.-552T>C|||||552|WARNING_TRANSCRIPT_NO_STOP_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP2|protein_coding||c.-759T>C|||||759|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP3|protein_coding||c.-1575T>C|||||1575|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP1|protein_coding||c.-2289T>C|||||2289|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2A|protein_coding||c.-3195T>C|||||3195|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2B|protein_coding||c.-3642T>C|||||3642|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2C|protein_coding||c.-3933T>C|||||3933|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_3A|protein_coding||c.-4920T>C|||||4920|WARNING_TRANSCRIPT_NO_START_CODON,C|intergenic_region|MODIFIER|CHR_START-Gene_742_7371|CHR_START-Gene_742_7371|intergenic_region|CHR_START-Gene_742_7371|||n.191T>C|||||| GT:GQ 0/1:18 ...
It seems to be a problem with the start and stop codons but I used the standard codon set. Additionally, every time a variant is called the annotated gene is always the same (Gene_742_7371).
Describe the solution you'd like A clear and concise description of what you want to happen.
Would be nice to have the exact genes and without the warnings.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.