There are a few ways to go about this for example some CPUs have specific instrcutions for this. After some research the most portable and robust way appears to be multiplying out the bitpacked value such that the bits get put in the right place in a 64bit word, something like:
There are a few ways to go about this for example some CPUs have specific instrcutions for this. After some research the most portable and robust way appears to be multiplying out the bitpacked value such that the bits get put in the right place in a 64bit word, something like:
Care needs to be taken, for example, that the length of the dest array is a multiple of 8. This results in a ~25% speed up for 1kg.