pjedge / longshot

diploid SNV caller for error-prone reads
MIT License
177 stars 26 forks source link

Issue with strand bias? #64

Open Mailinnia opened 3 years ago

Mailinnia commented 3 years ago

Hi,

I am having an issue with longshot not calling SNPs that I know to be true. They seem to be filtered out based on strand bias (FILTER sb).

I'm using the following command: longshot -r CYP2D6:2600-8700 --bam NA23348.sorted.bam --ref .CYP2D6.NG008376.4.fasta --out PCR_0.01.vcf --no_haps --strand_bias_pvalue_cutoff 0.01 --min_alt_frac 0.2 -d longshot0.01_debug

If I set strand_bias_pvalue_cutoff 0.0, it of course finds the SNPs, but also gives SNPs where there is clear strand bias.

This SNP should be found but isn't at cut-off 0.01: image I don't understand why it is claiming there is strand bias when the alternate G is found in 609+ and 739- strands. That doesn't seem biased to me. I am losing several true SNPs due to this type of 'strand bias'.

I under stand this one, where there is clear strand bias (alternate C in 3+ and 827- strands): image

What can I do to mitigate this issue?

vibansal commented 3 years ago

Since the coverage is very high, the strand-bias filter removes SNPs that have a very slight strand-bias. One solution is to use a lower p-value threshold (e.g. 0.0001) that should only remove SNPs with a strong strand-bias. Also, we can address this in longshot by adding an additional criteria for filtering SNPs due to strand-bias that also uses the magnitude of the bias.

Mailinnia commented 3 years ago

How do I filter based on the magnitude of the bias?

vibansal commented 3 years ago

This is not yet implemented in longshot and the output VCF file does not have information to calculate the magnitude of the bias.