lightbulb128 / troy-nova

GPU/CUDA implementation of Leveled BFV/CKKS/BGV scheme.
20 stars 6 forks source link

CPU compatibility #9

Closed henshy closed 2 months ago

henshy commented 3 months ago

How can we optimize for CPU compatibility? There are instances where matrix multiplication is fairly quick even without a GPU, so how can we tweak the system to work efficiently with just a CPU?

lightbulb128 commented 3 months ago

This library also supports directly running on CPU, so perhaps you could just run the matrix multiplication benchmark (bench_matmul) with your host CPU and see its efficiency? I don't catch what do you mean by "optimize for CPU compatibility".

henshy commented 3 months ago

Indeed, the code can run on a CPU, but it requires a GPU as a prerequisite, or else the command 'CoeffModulus.create(n, log_qi)' would trigger a 'no GPU' exception. What I'm envisioning is a situation where the absence of a GPU doesn't prevent the code from compiling and running on a CPU, much like how TensorFlow operates adaptively.

lightbulb128 commented 3 months ago

Oh I see the problem here. Well, I tested all the code with a machine that indeed has a GPU and CUDA support. I think I should try the code and the unit tests with some other machine that is only CPU. Will update this when I finish.

henshy commented 3 months ago

Alright, do we have an estimated release date for the CPU version?

lightbulb128 commented 3 months ago

Yes. Released just now. Tests using device will be skipped when run. Examples will use host if no device is detected. Use troy::utils::device_count() > 0 to determine if there is any device available. Feel free to comment further if you encounter problems, or close this once you are done.

henshy commented 3 months ago

I am using the Python API and have installed the CUDA toolkit for compilation(My base image is FROM nvidia/cuda:12.1.0-devel-ubuntu18.04."). However, I have not installed the CUDA driver. When I run my code, I encounter the following exception: 「RuntimeError: [device_count] cudaGetDeviceCount failed: CUDA driver version is insufficient for CUDA runtime version」 Is there a way to avoid GPU-related operations without initializing the GPU?

lightbulb128 commented 3 months ago

I use device_count() to check if there is any device available. Updated this function just now. If there is still any problem perhaps you could tweak this function yourself and give a pull request.

https://github.com/lightbulb128/troy-nova/blob/c89a8980c2b266d9fe82f69b03f52b62abaecf5f/src/utils/memory_pool.h#L13-L32

Previously

https://github.com/lightbulb128/troy-nova/blob/00a08467f771dd51c6250a26894f79e80021177b/src/utils/memory_pool.h#L13-L25

henshy commented 2 months ago

Ths, This is solve my problem!