Closed henshy closed 2 months ago
This library also supports directly running on CPU, so perhaps you could just run the matrix multiplication benchmark (bench_matmul
) with your host CPU and see its efficiency? I don't catch what do you mean by "optimize for CPU compatibility".
Indeed, the code can run on a CPU, but it requires a GPU as a prerequisite, or else the command 'CoeffModulus.create(n, log_qi)' would trigger a 'no GPU' exception. What I'm envisioning is a situation where the absence of a GPU doesn't prevent the code from compiling and running on a CPU, much like how TensorFlow operates adaptively.
Oh I see the problem here. Well, I tested all the code with a machine that indeed has a GPU and CUDA support. I think I should try the code and the unit tests with some other machine that is only CPU. Will update this when I finish.
Alright, do we have an estimated release date for the CPU version?
Yes. Released just now.
Tests using device will be skipped when run. Examples will use host if no device is detected.
Use troy::utils::device_count() > 0
to determine if there is any device available.
Feel free to comment further if you encounter problems, or close this once you are done.
I am using the Python API and have installed the CUDA toolkit for compilation(My base image is FROM nvidia/cuda:12.1.0-devel-ubuntu18.04."). However, I have not installed the CUDA driver. When I run my code, I encounter the following exception: 「RuntimeError: [device_count] cudaGetDeviceCount failed: CUDA driver version is insufficient for CUDA runtime version」 Is there a way to avoid GPU-related operations without initializing the GPU?
I use device_count()
to check if there is any device available. Updated this function just now. If there is still any problem perhaps you could tweak this function yourself and give a pull request.
Previously
Ths, This is solve my problem!
How can we optimize for CPU compatibility? There are instances where matrix multiplication is fairly quick even without a GPU, so how can we tweak the system to work efficiently with just a CPU?