lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

Rebased intel-extend branch - 10% speedup using SSE4 (AVX2 even faster) #207

Closed zamaudio closed 6 years ago

zamaudio commented 6 years ago

This patch results in ~10% speedup for bwa mem with SSE4 based code, (AVX2 untested, but should be even faster). I noticed that some of the compiler optimization flags were not correct in the original branch, this has been fixed here. I believe the code is compatible with master as it will not use the fast extend code without the -f flag.

zamaudio commented 6 years ago

This patch actually slows down bwa mem by 12% :-1: