Open andreasbuhr opened 4 years ago
Hi @BrettDong,
thank you so much for your review. Yes, I do know that (uint64_t)p[0]<<32 | p[1]
is not equivalent to *((uint64_t*)p)
. But you when taking into account the function __occ_aux
, things are different. The function __occ_aux
counts how many two-bit tuples bits are equal to c
in the 64bit integer. The parameter c
can only be 0, 1, 2, or 3. The number of two-bit tuples equal to c does not change when reordering the bytes. So __occ_aux((uint64_t)p[0]<<32 | p[1], c)
is in fact equivalent to __occ_aux(*((uint64_t*)p), c)
. And it is significantly faster. So I'd still suggest to merge this pull request.
best regards, Andreas
Thank you a lot for your detailed explanation! After I incorporated these three lines of changes my application seem to get stuck in an infinite loop inside BWT but if I restore these three lines the problem immediately disappears. Therefore I thought the problem arose from the equivalence got broken. Now the culprit must lie somewhere else.
Can you provide instructions how to reproduce this? Like what exact command are you running on what exact input files?
Seems the reason of the problem I encountered is the compiler doing aggressive inline and
y = ((c&2)? y : ~y) >> 1 & ((c&1)? y : ~y) & 0x5555555555555555ull;
this line overwrites content in p
array.
The function takes y by value, so it should not change the array. Do you think it is a bug in your compiler?
I think I was encountering a compiler bug. So sorry for my impulsive comment above. Your changes are indeed mathematically equivalent and should not break the behaviour of the program.
Some changes which result in a performance improvement of about 1.5%
I know it makes the code less readable. Only merge if you like it.