zama-ai / tfhe-rs

TFHE-rs: A Pure Rust implementation of the TFHE Scheme for Boolean and Integer Arithmetics Over Encrypted Data.
Other
826 stars 126 forks source link

chore(gpu): allocate pinned host memory when possible #1287

Closed agnesLeroy closed 1 week ago

agnesLeroy commented 1 week ago

This speeds up the copy to the GPU

closes: please link all relevant issues

PR content/description

Check-list:

agnesLeroy commented 1 week ago

Do we know pinned memory will bring any advantage? AFAIK it is more expensive than malloc and only really useful if you repeatedly run copies using the pinned pointer, which I don't think happens often in our case.

My intention was to run the benchmarks to see the effect. I've rebased on main and triggered the multi-bit 64 bit ones we should get the results soon :slightly_smiling_face:

agnesLeroy commented 1 week ago

@pdroalves so this doesn't seem to have any effect on performance: grafana link

agnesLeroy commented 1 week ago

I' running complementary benchmarks. Actually I think this will have some effect when using multiple GPUs, based on this: https://medium.com/gpgpu/multi-gpu-programming-6768eeb42e2c

agnesLeroy commented 1 week ago

@pdroalves so I've checked single & multi-GPU performance on this branch compared to main and there is no improvement. If anything we seem to loose 1ms in the multi-gpu benchmarks for all precisions & operations. I guess we don't want to merge this. single-gpu run: https://github.com/zama-ai/tfhe-rs/actions/runs/9693223295/job/26748315432 multi-gpu run: https://github.com/zama-ai/tfhe-rs/actions/runs/9695743021 (Benchmarks on main are still ongoing, they haven't been uploaded to Grafana with the fix in the FFT yet).

pdroalves commented 1 week ago

Yeah, don't think our case benefits from pinned memory.