zhanxw / rvtests

Rare variant test software for next generation sequencing data
126 stars 41 forks source link

SnpEff annotated VCF #49

Open Thatguy027 opened 6 years ago

Thatguy027 commented 6 years ago

Hello,

I have been playing around with the rvtests docker container for a couple of days and things have been running smoothly, thanks for the nice utility!

Yesterday I attempted using a VCF file annotated with SnpEff for burden testing and I was unable to successfully use the --annoType flag to filter variants (I put in a specific annotation type and none were identified in the tested gene even though I know that there are some - --annoType missense_variant, i also tried puting missense in quotes, etc.). I noticed that you suggest using the anno package that you have developed for VCF annotation, which outputs a different ANN field in the annotated VCF than SnpEff.

SnpEff: ANN=C|frameshift_variant|HIGH|WBGene00022365|WBGene00022365|transcript|Y92H12BL.4|protein_coding|3/6|c.349delA|p.Arg117fs|349/1194|349/1194|117/397||;LOF=(WBGene00022365|WBGene00022365|1|1.00)

anno (from your site): ANNO=Nonsynonymous:GENE1|GENE3;ANNOFULL=GENE1/CODING_GENE:+:Nonsynonymous(CCT/Pro/P->CAT/His/H:Base3/30:Codon1/10:Exon1/5):Normal_Splice_Site:Exon|GENE3/CODING_GENE:-:Nonsynonymous(AGG/Arg/R->ATG/Met/M:Base30/30:Codon10/10:Exon5/5):Normal_Splice_Site:Exon|GENE2/NON_CODING_GENE:+:Upstream

The paper associated with the rvtests package says it can operate on annotations from SnpEff, so I am a bit confused. Maybe it is just an issue with not finding the filed correctly because they have different names?

I am using the latest version of SnpEff and the version of rvtetsts in the docker container.

Similarly, I am wondering if given the refFlat file, rvtests can make predictions on the fly regarding higher impact variants based on the transcription start and stop site, and the exon starts and stops.

I think it would be nice to select on high or moderate predicted effects (as predicted by SnpEff) for a given gene. This operation can be easily done using SnpSift, but I am wondering if I am missing some built-in functionality of rvtests.

Thanks for your time

zhanxw commented 5 years ago

Sorry for the confusion. Actually, you can process the ANN tag and group variants into a set file. Then the set file can be used along with --setFile option for association test. See: http://zhanxw.github.io/rvtests/#specify-groups-eg-burden-unit