hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
125 stars 55 forks source link

Genotyping Output Size Doesn't Match Input Size #18

Closed djakubosky closed 8 years ago

djakubosky commented 8 years ago

I have a quick question. I have generated a lumpy vcf, using an exclude.bed to exclude areas of high read depth (as described on lumpy git), and then removing calls in low complexity regions, seg dupes, centromeric/telomeric regions and calls in non-autosomal contigs before using svtyper to genotype. My input vcf is around 6K regions, and the output from svtyper ends up at around 4800, is this expected? What happens to these regions that do not appear in the output?

Wondering if I'm doing something wrong or if additional things might need to be filtered before using svtyper.

cc2qe commented 8 years ago

One way that it may occur is if you are only removing one side of the multi-line variants. Variants with SVTYPE=BND are on two lines, since they may be interchromosomal events. If you do a bedtools intersect to remove one of the two lines, then SVTyper will view the variant as incomplete and will not print out either of the lines.

My guess is that you've only lost BND variants.

djakubosky commented 8 years ago

Yes, you're right, these are cases where the mate has been filtered due to overlap of my exclusion criteria, this is good to know thanks!