This patch results in ~10% speedup for bwa mem with SSE4 based code, (AVX2 untested, but should be even faster).
I noticed that some of the compiler optimization flags were not correct in the original branch, this has been fixed here. I believe the code is compatible with master as it will not use the fast extend code without the -f flag.
This patch results in ~10% speedup for bwa mem with SSE4 based code, (AVX2 untested, but should be even faster). I noticed that some of the compiler optimization flags were not correct in the original branch, this has been fixed here. I believe the code is compatible with master as it will not use the fast extend code without the
-f
flag.