Switch to faster bit reversal algorithm

There's a faster 64-bit reversal algorithm than what we're doing. It's O(log(bits)) instead of O(bits), using clever bit hacks:

x = ((x >> 32) & 0x00000000ffffffff) | (x << 32);
x = ((x >> 16) & 0x0000ffff0000ffff) | ((x & 0x0000ffff0000ffff) << 16);
x = ((x >> 8) & 0x00ff00ff00ff00ff) | ((x & 0x00ff00ff00ff00ff) << 8);
x = ((x >> 4) & 0x0f0f0f0f0f0f0f0f) | ((x & 0x0f0f0f0f0f0f0f0f) << 4);
x = ((x >> 2) & 0x3333333333333333) | ((x & 0x3333333333333333) << 2);
x = ((x >> 1) & 0x5555555555555555) | ((x & 0x5555555555555555) << 1);

This is slightly different to our case because we want to reverse the bit string based on the size of the FFT. So for a 256 element FFT we want to turn 0x1 into 0x80, not 2^63. So we'll need to shift the result based on the log2 of the FFT size.

Also rerun the benchmarks with this improvement.

liamappelbe / fftea

Switch to faster bit reversal algorithm #7