pcingola / SnpEff

Other
247 stars 78 forks source link

Database request #490

Open NmnBttr opened 1 year ago

NmnBttr commented 1 year ago

Database requests

  1. Organism name: Human poliovirus 1 strain Sabin 1
  2. Link gene definition file (e.g. GTF / GFF / GenBank): https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?db=nuccore&report=gff3&id=AY184219
  3. Link to Genome FASTA file/s: https://www.ncbi.nlm.nih.gov/nuccore/AY184219.1?report=fasta
  4. Link to CDS FASTA file:
  5. Link to Protein FASTA file: https://www.ncbi.nlm.nih.gov/protein/AAN85442.1?report=fasta

Note: Genome FASTA file might not be needed in some cases (e.g. GenBank files usually have genome sequence information)

Note: Either CDS or Protein FASTA files should be used to ensure correctness (sometimes these sequences are provided in the GenBank files).

Feature requests

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I tried to build the database on my own with the following calls:

In both cases, SnpEff runs without giving any errors. However, the resulting vcf file looks like this:

AY184219.1 191 . T C 18.151 PASS primary_call=T;primary_prob=0.556;ref_prob=0.556;secondary_call=C;secondary_prob=0.429;ANN=C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|AAN85442.1|protein_coding||c.-552T>C|||||552|,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP4|protein_coding||c.-552T>C|||||552|WARNING_TRANSCRIPT_NO_STOP_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP2|protein_coding||c.-759T>C|||||759|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP3|protein_coding||c.-1575T>C|||||1575|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_VP1|protein_coding||c.-2289T>C|||||2289|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2A|protein_coding||c.-3195T>C|||||3195|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2B|protein_coding||c.-3642T>C|||||3642|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_2C|protein_coding||c.-3933T>C|||||3933|WARNING_TRANSCRIPT_NO_START_CODON,C|upstream_gene_variant|MODIFIER|Gene_742_7371|Gene_742_7371|transcript|protein_3A|protein_coding||c.-4920T>C|||||4920|WARNING_TRANSCRIPT_NO_START_CODON,C|intergenic_region|MODIFIER|CHR_START-Gene_742_7371|CHR_START-Gene_742_7371|intergenic_region|CHR_START-Gene_742_7371|||n.191T>C|||||| GT:GQ 0/1:18 ...

It seems to be a problem with the start and stop codons but I used the standard codon set. Additionally, every time a variant is called the annotated gene is always the same (Gene_742_7371).

Describe the solution you'd like A clear and concise description of what you want to happen.

Would be nice to have the exact genes and without the warnings.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

hoelzer commented 1 year ago

Hey, thanks for any support! Just adding to what @NmnBttr already described: this is a virus genome composed of a single polyprotein. The polyprotein ist then post-translational splitted into mature proteins. But the mature proteins are not labeled as "CDS" in the annotation files.

I think this is a general problem working with polyproteins?

One of our ideas was to modify the annotation in a way to fit the schemes of SNPEff but we were not successful so far.

Thanks!