hammerlab / biokepi

Bioinformatics Ketrew Pipelines
Apache License 2.0
27 stars 4 forks source link

Default `bwa mem` penalty parameters in biokepi are #445

Closed armish closed 7 years ago

armish commented 7 years ago

These are the defaults we have right now:

https://github.com/hammerlab/biokepi/blob/master/src/bfx_tools/bwa.ml#L8-L9

(* ... *)
  let default_gap_open_penalty = 11
  let default_gap_extension_penalty = 4
(* ... *)

which is compiled into bwa mem -O 11 -E 4 by default. However, bwa mem's defaults are -O 6 -E 1 according to the tool documentation (kudos: @smondet)

This discrepancy might discourage bwa mem from adding gaps to the alignments since the biokepi ones are relatively higher than the default ones (and these scores are often co-optimized). I am currently experimenting with a few parameter combinations on real patient data to see whether this is actually causing us any trouble and if so, we probably need to convert these back to the mem defaults.

armish commented 7 years ago

After some experimentation with the -L parameter, it looks like using defaults for all of them resulted in the best alignment and it doesn't seem to favor soft-clipping over single gaps near the ends of the reads anymore. Below are some stats (extracted via @timodonnell's varlens tool) on how many reads support the single-nucleotide deletion at a particular genomic loci from original, biokepi-realigned, and bwa-mem with different parameters. Note that the bwa mem defaults were able to recall all the original variant supporting reads:

source,contig,interbase_start,interbase_end,allele,count
OriginalNormal,3,52662963,52662964,A,93
OriginalTumor,3,52662963,52662964,A,119
OriginalTumor,3,52662963,52662964,,13
BiokepiRealignedTumor,3,52662963,52662964,A,121
BiokepiRealignedTumor,3,52662963,52662964,,6
BiokepiReliagnedL10,3,52662963,52662964,A,121
BiokepiReliagnedL10,3,52662963,52662964,,5
BwaMemDefaultsTumor,3,52662963,52662964,A,121
BwaMemDefaultsTumor,3,52662963,52662964,,13

Will revisit the way we parametrize things for bwa with a follow-up PR.