pcingola / SnpEff

Other
244 stars 78 forks source link

how to extract splice_region_variant from coding regions? #290

Closed RichardCorbett closed 3 years ago

RichardCorbett commented 3 years ago

Hi folks,

I am using snpEff to annotate variants called from hg19 aligned short reads. I am relying heavily on the impact/effect information that is reported. One recent task I've been tackling is how to access the "LOW" impact variants that overlap with the coding regions. I've gone through each of the items in the link below to see which types of variants might have "LOW" impact that intersect with coding regions. Most every variant type is clearly overlapping coding regions or not, but the "splice_region_variant" appears to be listed for a group of situations that include variants in the exons and/or in introns. https://pcingola.github.io/SnpEff/se_inputoutput/#effect-prediction-details

Is there a way to parse the snpEff output to get the only the LOW impact variants that overlap with the coding space of the annotations?

pcingola commented 3 years ago

The simplest way that I can see is to extract those variants (e.g. using grep splice_region_variant ...) and then re-annotate using a combination of parameters:

    -spliceRegionExonSize <int>  : Set size for splice site region within exons. Default: 3 bases
    -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
    -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases

So, you can re-annotate the extracted splice_region_variant variants using -spliceRegionIntronMin 0 and -spliceRegionIntronMax 0, and filter again splice_region_variant (i.e. use grep one more time), thus you'll get a list of the variants that do NOT overlap with the intron (i.e. the ones that overlap with the exon side).

I hope this helps.

pcingola commented 3 years ago

Closing, feel free to reopen.

RichardCorbett commented 3 years ago

Perfect. Thanks @pcingola!