Open Luidy opened 5 years ago
I wanna connect several GPU to increase the speed of nuFHE. Is this possible?
Technically, yes, although currently you will have to handle data transfer yourself. The GPU the work is happening on is defined by the Thread
object you pass to nufhe
functions. You will need to create several reikna
's Thread
objects for the target GPUs. It may look like:
from reikna.cluda import cuda_api
api = cuda_api()
platform = api.get_platforms()[0] # For CUDA there's always only one platform, for OpenCL there can be several
devices = api.get_devices()
thr1 = api.Thread.create(devices[0])
thr2 = api.Thread.create(devices[1])
(see reikna
docs for details). Now there are several problems.
First, Thread
objects (and underlying GPU contexts) have separate memory pools. (For CUDA, it may be possible to use unified memory, and for OpenCL, you can manually create a context, and then create Thread
objects for a single context, but different CommandQueue
s. This will require interaction with PyCUDA
or PyOpenCL
, respectively. I haven't investigated either variant in details, so I don't know how well it will work.) This means that a ciphertext created with thr1
can only be used with other thr1
-based ciphertext in a gate. If you want to pass data between GPUs, you will have to do it through CPU.
Second, if you just create two Thread
objects in a single OS process, it won't give you much of a speedup - gates are not completely asynchronous and will block for the majority of execution time. So you will need to run a multi-process code and exchange the data between processes.
All in all, it is not straightforward at the moment. I will leave this issue open and try to figure out what kind of interface should be exposed to make multi-GPU convenient.
I wanna compute 1 ciphertext and 1 encoding text before generating the ciphertext. Is this possible?
I am not sure I understand you. Could you explain in more detail? And, perhaps, open a separate issue - this one will be reserved for multi-GPU.
Also really interested in the evolution of this ticket
Is anyone ever tested this on a vmware vGPU rig ?
I am not sure at the moment what level of abstraction would be best. The minimal version would be something like this (this example is already working on my machine, just need to polish some things in the implementation). Essentially, this means one nufhe.Context
per thread/process, and the user can choose whatever parallel execution model they want - be it threading, multiprocessing, or MPI. Will that be fine for your purposes (as a start, at least?).
It may be possible to do single-thread multi-GPU, but there are several problems to solve. CUDA and OpenCL use different models for that, and I need to check if PyCUDA and PyOpenCL actually expose the corresponding API (and reikna
will require an update as well, since it uses the simple multi-GPU model above). Since most computations are batched over ciphertext bits, it probably won't be too hard to split them between GPUs automatically.
Edit: there may be problems if single-kernel bootstrap is not available, which means we'll have 500 kernel calls instead of several, so some internal thread/process pool will be necessary to parallelize that.
I've added a multi-GPU example (examples/multi_gpu.py
, commit 9539b6563d2e4897c869a68f1bdacc8c163b9059) and some supporting internals.
Hello. I have 2 questions about nuFHE.
I'll wait for your reply. Thanks.