lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

performance increase in alignment #279

Open andreasbuhr opened 4 years ago

andreasbuhr commented 4 years ago

Some changes which result in a performance improvement of about 1.5%

I know it makes the code less readable. Only merge if you like it.

andreasbuhr commented 4 years ago

Hi @BrettDong,

thank you so much for your review. Yes, I do know that (uint64_t)p[0]<<32 | p[1] is not equivalent to *((uint64_t*)p). But you when taking into account the function __occ_aux, things are different. The function __occ_aux counts how many two-bit tuples bits are equal to c in the 64bit integer. The parameter c can only be 0, 1, 2, or 3. The number of two-bit tuples equal to c does not change when reordering the bytes. So __occ_aux((uint64_t)p[0]<<32 | p[1], c) is in fact equivalent to __occ_aux(*((uint64_t*)p), c). And it is significantly faster. So I'd still suggest to merge this pull request.

best regards, Andreas

BrettDong commented 4 years ago

Thank you a lot for your detailed explanation! After I incorporated these three lines of changes my application seem to get stuck in an infinite loop inside BWT but if I restore these three lines the problem immediately disappears. Therefore I thought the problem arose from the equivalence got broken. Now the culprit must lie somewhere else.

andreasbuhr commented 4 years ago

Can you provide instructions how to reproduce this? Like what exact command are you running on what exact input files?

BrettDong commented 4 years ago

Seems the reason of the problem I encountered is the compiler doing aggressive inline and

y = ((c&2)? y : ~y) >> 1 & ((c&1)? y : ~y) & 0x5555555555555555ull;

this line overwrites content in p array.

andreasbuhr commented 4 years ago

The function takes y by value, so it should not change the array. Do you think it is a bug in your compiler?

BrettDong commented 4 years ago

I think I was encountering a compiler bug. So sorry for my impulsive comment above. Your changes are indeed mathematically equivalent and should not break the behaviour of the program.