simonzhang00 / ripser-plusplus

Ripser++: GPU-accelerated computation of Vietoris–Rips persistence barcodes
MIT License
98 stars 14 forks source link

How to utilize multigpus? #10

Open VietTralala opened 2 years ago

VietTralala commented 2 years ago

Hi thank you for the awesome library! Everything works fine and was easy to install. However I have a multi-gpu system where I want to run ripser on a large distance matrix. And I only see utilization of the first gpu in the system i.e. using the following code

import ripserplusplus as rpp_py
import numpy as np
X = np.random.rand(10000, 1000) # 10k samples, 1k features
X = 1- X @ X.T # calc cosine distance
np.fill_diagonal(X, 0)
diagram = rpp_py.run("--format distance", X)

How can I use all my gpus?

simonzhang00 commented 2 years ago

Hi,

Thank you for your interest in using GPUs for PH computation.

The design of our program was meant for one GPU per data. So if you wanted to use multi GPUs, I would suggest batching (a common approach when computing over many point clouds/distance matrics in machine learning)

I have not tried this out myself, but here is some code to get you started with multi GPU batching:

You will need to modify the .cu file by calling: cudaError_t cudaSetDevice(int device_id)

at the beginning of computation.

and writing code to pass in a device_id integer into the .cu program.

see: blog post on cudaSetDevice()

Do something like this:

On CUDA side:


//input a parameter device_id:

cudaSetDevice(device_id)

...//carry on usual code

On python side:

import threading
barrier= threading.Barrier(num_devices)
class thread(threading.Thread):
    def __init__ (self, thread_ID, data, params):
        threading.Thread.__init__(self)
        self.thread_ID= thread_ID
        self.data= data
        self.params= params
    def run(self):
        #run ripser++ with (device_id= self.thread_ID, X= self.data, arguments= self.params)
        barrier.wait()

for batch in dataloader:
    threads= [thread(i, data, params) for i,data, params in enumerate(batch)]
    for t in threads:
        t.start()
    barrier.wait()

If you find time to successfully implement the CUDA side device switch, please send in a pull request. We always welcome contributors!!

VietTralala commented 2 years ago

Thank you for the fast response. If I understand you right, you suggest running multiple threads of ripser++ with each operating on its own distance matrix? This approach works if one has batches of distance matrices. Can be easily parallelized by using an additional python library such as ray or dask. However I was wondering if one could use the combined gpu power and ram of multiple gpus to compute the PH of a single but large instance of a distance matrix

zhangxinyue-123 commented 9 months ago

Thank you for the fast response. If I understand you right, you suggest running multiple threads of ripser++ with each operating on its own distance matrix? This approach works if one has batches of distance matrices. Can be easily parallelized by using an additional python library such as ray or dask. However I was wondering if one could use the combined gpu power and ram of multiple gpus to compute the PH of a single but large instance of a distance matrix

Hi, I have encountered the same problem as you! Have you solved the problem of using multiple GPUs to compute a single but large instance? I will be extremely grateful and look forward to your reply! Hope to have further communication with you,thanks!@VietTralala @simonzhang00