This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
For an alignment that doesn't have an indel but is aligned against reads that do have an indel, the indel quality comes from the BAM quality. However we already have indelQ assigned, so this avoids changing to BAM qual if indelQ is zero as that is a special case for a read aligning to multiple indel "types" (lengths) with equal score.
This avoids excess AD numbers for poorly chosen alignments.
Fixes #2113
Benchmarks before and after on a single sample HG002. Identical for both as the change only affects multi-sample evaluation as it's changing scores when another sample has an indel but we do not.
No change to SNP obviously, and an approx halving of the FP rate. This likely corresponds to the change in AD calculations which previous gave false counting (for an apparently no gain in sensitivity).
Note in both cases, we're still better off not doing multi-sample calling if we want accuracy, which was a surprise.
For an alignment that doesn't have an indel but is aligned against reads that do have an indel, the indel quality comes from the BAM quality. However we already have indelQ assigned, so this avoids changing to BAM qual if indelQ is zero as that is a special case for a read aligning to multiple indel "types" (lengths) with equal score.
This avoids excess AD numbers for poorly chosen alignments.
Fixes #2113
Benchmarks before and after on a single sample HG002. Identical for both as the change only affects multi-sample evaluation as it's changing scores when another sample has an indel but we do not.
The same HG002 sample, but called in the context of HG003 and HG004 and then split apart again.
develop:
This PR:
No change to SNP obviously, and an approx halving of the FP rate. This likely corresponds to the change in AD calculations which previous gave false counting (for an apparently no gain in sensitivity).
Note in both cases, we're still better off not doing multi-sample calling if we want accuracy, which was a surprise.