Observations on multi-samp calling

jkbonfield commented 7 months ago

I'm trying to evaluate the effect of my mpileup changes on multi-sample calling. I have a single sample evaluation which just compares a call VCF against a truth/benchmark VCF, so I thought I'd take the following strategy:

mpileup | call with HG00[234]_blah.bam to get a joint.vcf
bcftools +split on the joint.vcf to generate sample specific vcfs
Use my existing comparison scripts.

Unfortunately this produces thousands of false positives, which turns out to be down to 0/0 calls still being present, and some mix up with 2/3 etc. After a lot of experimentation (and pointlessly writing a hacky script) I discovered that bcftools norm -m -both can remove unnecessary ALT alleles, and a quick grep -v '[.0]/[.0]:' removes the GT 0/0 or ./. calls.

Original GT distribution after split:

Split after norm and grep:

  14468 0/1 
   1735 1/0 
   5728 1/1

Compare this however to the single sample call (so just HG002 alone, in isolation), also using norm -m -both for a fair comparison of those 1/2:

  13773 0/1 
   1576 1/0 
   5763 1/1

So - we have more variants called for HG002 when it's called in conjunction with HG003 and HG004. That's good, right? Well it would be if they were true!

Single sample:

SNP          Q>0 /   Q>=50 / Filtered
SNP   TP   71077 /   70956 /   70956
SNP   FP     766 /     296 /     293
SNP   GT      41 /      35 /      35
SNP   FN     310 /     431 /     431

InDel TP   11732 /   11627 /   11627
InDel FP      85 /      63 /      63
InDel GT     298 /     292 /     292
InDel FN     206 /     311 /     311

Trio and split:

SNP          Q>0 /   Q>=50 / Filtered
SNP   TP   71125 /   71074 /    4899
SNP   FP    1215 /     663 /     216
SNP   GT      56 /      45 /      30
SNP   FN     262 /     313 /   66488

InDel TP   11765 /   11747 /    1163
InDel FP      99 /      84 /      24
InDel GT     299 /     297 /      92
InDel FN     173 /     191 /   10775

As expected FN has removed, but FP has considerably gone up, particularly with SNP. Why would this be? I'll do some isec soon to explore more, but wondering if this is a known problem.

jkbonfield commented 7 months ago

Also, I'm interested in how you normally do large scale evaluation for multi-sample calling, as I should probably have been doing this before making my --indel-cns PR. Guidance is needed here.

jkbonfield commented 7 months ago

Actually, ignore me - looking at this again I see what's happening is a shift in the FP vs FN ratio. If we look at trio with QUAL>=50 vs single sample at QUAL >= 0 then the trio is better. It's simply everything has a higher confidence which makes it look like it has a bad FP rate. I'd really need to plot the ROC to show the tradeoff as QUAL increases.

samtools / bcftools

Observations on multi-samp calling #2120