Experiments show we can reduce precision down to 32 bits and still get accurate results. This allows us to increase the throughput of atomics in the histogram kernel.
The implementation is somewhat hacky, involving reinterpret casts, although the code changes are not large.
Experiments show we can reduce precision down to 32 bits and still get accurate results. This allows us to increase the throughput of atomics in the histogram kernel.
The implementation is somewhat hacky, involving reinterpret casts, although the code changes are not large.