kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
88 stars 10 forks source link

using short paired end split reads? #91

Closed egnst closed 2 weeks ago

egnst commented 2 weeks ago

I'm studying a dataset where each sample has one known translocation, and I'm trying to detect if they have other SVs, as well. We performed paired-end short read targeted capture sequencing, so I have a high sequencing depth for several regions of interest.

I aligned my experiment using BWA MEM. When I look at reads that align to the site of the known translocation, I can see that I have a lot of split reads that align to ~50bp on either side of the expected breakpoint. When I run Dysgu, it doesn't call the known translocation. In addition, for every SV that is called, SR=0, so I guess all of these split reads are not being utilized. What is the correct way to include split reads in short read PE analyses?

I'm using dysgu v.1.6.3 and python 3.12.1. I've been troubleshooting the "dysgu run" parameters, most recently:

dysgu run -p4 -x --keep-small --min-support 3 --metrics --max-cov -1 hg38.fa temp_dir bamfile.bam > vcffile.vcf

kcleal commented 2 weeks ago

Hi @egnst, It's possible the supplementary reads are being ignored because they have low mapping quality, you can try setting the minimum map quality to zero --mq 0. If you could share a screenshot of the region using IGV or GW then I might be able to offer some more advice.

kcleal commented 2 weeks ago

Also it might be worth running samtools flagstats to check to see how many supplementary reads you have. Dysgu relies on the 2048 flag being set for split reads

egnst commented 2 weeks ago

Checking flagstats solved my problem-it helped me identify a silly error in my alignment.

I've been using BWA-MEM, and forgot that my typical alignment includes the -M option, which flags split reads as secondary, not supplemental. Removing that option and redoing the alignment was an easy fix.