pcingola / SnpEff

Other
243 stars 78 forks source link

stop_gained not reported as LOF #315

Closed julibeg closed 8 months ago

julibeg commented 3 years ago

In the documentation it says "We adopted a definition for LoF variants expected to correlate with complete loss of function of the affected transcripts: stop codon-introducing (nonsense)...". However, when running SnpEff on my VCF, it does not flag a single one out of ~16k variants with |stop_gained| in the ANN field as LOF. Is this expected?

I have extracted one of the variants and included it in an example.zip alongside the output of SnpEff.

The VCF is from Mtb and I ran SnpEff with

java -Xmx8g -jar ~/snpEff/snpEff.jar Mycobacterium_tuberculosis_h37rv example.vcf > example_snpEff.vcf
ramiroricardo commented 2 years ago

Dear all,

I have a similar question. I have been using snpeff v5 and I am not sure why the variants below are not annotated as loss of function? both are outside of the 5% limit at the edges and still are not annotated as LOF. If someone can explain why, that would be great

To run SnpEff I used:

snpEff -c snpEff.config $REF -v -lof ../path/$VCF > $SAMP.anno.vcf

Example of problematic variant calls:

chromosome  3690539 .   T   G   169.4   PASS    AF=0.3068;AD=54;DP=176;ANN=G|stop_gained|HIGH|TDA1000_02826|GENE_TDA1000_02826|transcript|TRANSCRIPT_TDA1000_02826|protein_coding|1/1|c.444T>G|p.Tyr148*|444/1542|444/1542|148/513||,G|upstream_gene_variant|MODIFIER|TDA1000_02825|GENE_TDA1000_02825|transcript|TRANSCRIPT_TDA1000_02825|protein_coding||c.-599A>C|||||599|,G|upstream_gene_variant|MODIFIER|TDA1000_02827|GENE_TDA1000_02827|transcript|TRANSCRIPT_TDA1000_02827|protein_coding||c.-1124T>G|||||1124|,G|upstream_gene_variant|MODIFIER|TDA1000_02828|GENE_TDA1000_02828|transcript|TRANSCRIPT_TDA1000_02828|protein_coding||c.-3129T>G|||||3129|,G|downstream_gene_variant|MODIFIER|TDA1000_02829|GENE_TDA1000_02829|transcript|TRANSCRIPT_TDA1000_02829|protein_coding||c.*4193A>C|||||4193|,G|intragenic_variant|MODIFIER|TDA1000_00149|null|gene_variant|null|||n.3690539T>G||||||

chromosome  3333474 .   C   A   5.6 PASS    AF=0.0209;AD=6;DP=285;ANN=A|stop_gained|HIGH|TDA1000_02562|GENE_TDA1000_02562|transcript|TRANSCRIPT_TDA1000_02562|protein_coding|1/1|c.1196C>A|p.Ser399*|1196/2010|1196/2010|399/669||,A|upstream_gene_variant|MODIFIER|TDA1000_02555|GENE_TDA1000_02555|transcript|TRANSCRIPT_TDA1000_02555|protein_coding||c.-4766G>T|||||4766|,A|upstream_gene_variant|MODIFIER|TDA1000_02556|GENE_TDA1000_02556|transcript|TRANSCRIPT_TDA1000_02556|protein_coding||c.-4327G>T|||||4327|,A|upstream_gene_variant|MODIFIER|TDA1000_02557|GENE_TDA1000_02557|transcript|TRANSCRIPT_TDA1000_02557|protein_coding||c.-3561G>T|||||3561|,A|upstream_gene_variant|MODIFIER|TDA1000_02558|GENE_TDA1000_02558|transcript|TRANSCRIPT_TDA1000_02558|protein_coding||c.-3235G>T|||||3235|,A|upstream_gene_variant|MODIFIER|TDA1000_02563|GENE_TDA1000_02563|transcript|TRANSCRIPT_TDA1000_02563|protein_coding||c.-988C>A|||||988|,A|upstream_gene_variant|MODIFIER|TDA1000_02564|GENE_TDA1000_02564|transcript|TRANSCRIPT_TDA1000_02564|protein_coding||c.-2520C>A|||||2520|,A|upstream_gene_variant|MODIFIER|TDA1000_02565|GENE_TDA1000_02565|transcript|TRANSCRIPT_TDA1000_02565|protein_coding||c.-2705C>A|||||2705|,A|downstream_gene_variant|MODIFIER|TDA1000_02559|GENE_TDA1000_02559|transcript|TRANSCRIPT_TDA1000_02559|protein_coding||c.*2896C>A|||||2896|,A|downstream_gene_variant|MODIFIER|TDA1000_02560|GENE_TDA1000_02560|transcript|TRANSCRIPT_TDA1000_02560|protein_coding||c.*2439C>A|||||2439|,A|downstream_gene_variant|MODIFIER|TDA1000_02561|GENE_TDA1000_02561|transcript|TRANSCRIPT_TDA1000_02561|protein_coding||c.*1216C>A|||||1216|,A|downstream_gene_variant|MODIFIER|TDA1000_02566|GENE_TDA1000_02566|transcript|TRANSCRIPT_TDA1000_02566|protein_coding||c.*4892G>T|||||4892|,A|intragenic_variant|MODIFIER|TDA1000_00149|null|gene_variant|null|||n.3333474C>A||||||