Closed armish closed 7 years ago
After some experimentation with the -L
parameter, it looks like using defaults for all of them resulted in the best alignment and it doesn't seem to favor soft-clipping over single gaps near the ends of the reads anymore. Below are some stats (extracted via @timodonnell's varlens tool) on how many reads support the single-nucleotide deletion at a particular genomic loci from original, biokepi-realigned, and bwa-mem with different parameters. Note that the bwa mem defaults were able to recall all the original variant supporting reads:
source,contig,interbase_start,interbase_end,allele,count
OriginalNormal,3,52662963,52662964,A,93
OriginalTumor,3,52662963,52662964,A,119
OriginalTumor,3,52662963,52662964,,13
BiokepiRealignedTumor,3,52662963,52662964,A,121
BiokepiRealignedTumor,3,52662963,52662964,,6
BiokepiReliagnedL10,3,52662963,52662964,A,121
BiokepiReliagnedL10,3,52662963,52662964,,5
BwaMemDefaultsTumor,3,52662963,52662964,A,121
BwaMemDefaultsTumor,3,52662963,52662964,,13
Will revisit the way we parametrize things for bwa with a follow-up PR.
These are the defaults we have right now:
https://github.com/hammerlab/biokepi/blob/master/src/bfx_tools/bwa.ml#L8-L9
which is compiled into
bwa mem -O 11 -E 4
by default. However,bwa mem
's defaults are-O 6 -E 1
according to the tool documentation (kudos: @smondet)This discrepancy might discourage
bwa mem
from adding gaps to the alignments since the biokepi ones are relatively higher than the default ones (and these scores are often co-optimized). I am currently experimenting with a few parameter combinations on real patient data to see whether this is actually causing us any trouble and if so, we probably need to convert these back to themem
defaults.