The default for storing a bool is to use an entire byte. This is inefficient, as 7 bits are wasted per element. In most normal cases, this cannot be felt, but for our sizes, it means that we have to transfer and allocate 8 times as much memory as we actually need. E.g. for our masks at scale 1x, one sample is 3456^3, which results in ~38 GB of memory. Moving to one bit per bool would result in ~4.75 GB, which should shift the bottleneck from data transfer and movement to compute, as the encoding / decoding adds extra computational overhead.
Another key benefit is that the morphological operations will also benefit, as the operations can be performed on entire words at a time, rather than just bytes.
The default for storing a bool is to use an entire byte. This is inefficient, as 7 bits are wasted per element. In most normal cases, this cannot be felt, but for our sizes, it means that we have to transfer and allocate 8 times as much memory as we actually need. E.g. for our masks at scale 1x, one sample is 3456^3, which results in ~38 GB of memory. Moving to one bit per bool would result in ~4.75 GB, which should shift the bottleneck from data transfer and movement to compute, as the encoding / decoding adds extra computational overhead.
Another key benefit is that the morphological operations will also benefit, as the operations can be performed on entire words at a time, rather than just bytes.