Closed jkbonfield closed 7 months ago
Also, I'm interested in how you normally do large scale evaluation for multi-sample calling, as I should probably have been doing this before making my --indel-cns
PR. Guidance is needed here.
Actually, ignore me - looking at this again I see what's happening is a shift in the FP vs FN ratio. If we look at trio with QUAL>=50 vs single sample at QUAL >= 0 then the trio is better. It's simply everything has a higher confidence which makes it look like it has a bad FP rate. I'd really need to plot the ROC to show the tradeoff as QUAL increases.
I'm trying to evaluate the effect of my mpileup changes on multi-sample calling. I have a single sample evaluation which just compares a call VCF against a truth/benchmark VCF, so I thought I'd take the following strategy:
mpileup | call
with HG00[234]_blah.bam to get a joint.vcfbcftools +split
on the joint.vcf to generate sample specific vcfsUnfortunately this produces thousands of false positives, which turns out to be down to 0/0 calls still being present, and some mix up with 2/3 etc. After a lot of experimentation (and pointlessly writing a hacky script) I discovered that
bcftools norm -m -both
can remove unnecessary ALT alleles, and a quickgrep -v '[.0]/[.0]:'
removes the GT 0/0 or ./. calls.Original GT distribution after split:
Split after norm and grep:
Compare this however to the single sample call (so just HG002 alone, in isolation), also using
norm -m -both
for a fair comparison of those 1/2:So - we have more variants called for HG002 when it's called in conjunction with HG003 and HG004. That's good, right? Well it would be if they were true!
Single sample:
Trio and split:
As expected FN has removed, but FP has considerably gone up, particularly with SNP. Why would this be? I'll do some
isec
soon to explore more, but wondering if this is a known problem.