tskit-dev / tsinfer

Infer a tree sequence from genetic variation data.
GNU General Public License v3.0
56 stars 13 forks source link

Speed up bit unpacking in `generate_ancestors` #816

Open benjeffery opened 1 year ago

benjeffery commented 1 year ago

There are a few ways to go about this for example some CPUs have specific instrcutions for this. After some research the most portable and robust way appears to be multiplying out the bitpacked value such that the bits get put in the right place in a 64bit word, something like:

void unpackbits(const uint8_t *restrict source, size_t len, int8_t *restrict dest) {
    uint64_t MAGIC = 0x8040201008040201ULL;
    uint64_t MASK  = 0x8080808080808080ULL;
    size_t dest_index = 0;
    for (size_t i = 0; i < len; i++) {
        uint64_t t = ((MAGIC*source[i]) & MASK) >> 7;
        *(uint64_t*)&dest[dest_index] = t;
        dest_index += 8;
    }
}

Care needs to be taken, for example, that the length of the dest array is a multiple of 8. This results in a ~25% speed up for 1kg.

jeromekelleher commented 1 year ago

You could imagine doing this with SSE instructions as well, but almost certainly not worth the hassle.