rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.71k stars 302 forks source link

[QST]: Multi GPU graph algorithm via libcugraph #3005

Closed Manoj-red-hat closed 1 year ago

Manoj-red-hat commented 1 year ago

What is your question?

Is there any way, by which we can partition a graph in GPU and use Multi GPU to commute result via libcugraph ?

Note: GPU can be on single node or multi-node

Code of Conduct

BradReesWork commented 1 year ago

@Manoj-red-hat I'm not sure I fully understand your question. We use a 2D partitioning scheme when loading data across multiple GPUs. But are you asking if there are other partitioning approaches, or are you wanting to get the partitions assignments?

Manoj-red-hat commented 1 year ago

@BradReesWork @ChuckHastings

Let me rephrase my question via distributed page-rank example

[Execution Environment Spark]

Single node (CPU + GPU) x 1 node Execution Flow Graph --> CPU (convert to COO format)(CPU) --> Cugraph (GPU) --> Return Page-rank (Everything Via UDF) On single node path is clear we can integrate and all works great

Issue is with Multi Node (CPU + GPU) x 4 node Execution Flow Graph --> CPU (convert to COO format) --> Partition Graph in 4 parts (GPU) --> assign each partitioned graph to each worker node(CPU )--> on each node call Cugraph (GPU) --> Return Page-rank (CPU) --> combine all (pagerank) (CPU )

[Approach 1]

  1. Partition graph via GPU (in distributed way)
  2. calculate individual partitioned graph on worker's node GPU (in single node way)

[Approach 2]

  1. Submitted graph to GPU cluster
  2. GPU Cluster take care partition & pagerank calculation Issue: For python we have Dask but if we are going to integrate that on spark via native graphx engine we can't use Dask, we need to create our own resource manager for GPU, which is somewhat tricky

In approach 1 & 2 both need some sort of GPU resource manager which can take care of coordination among GPUs residing on different nodes. But if we are able to do that it will inherently make cugraph a part of spark/tigergraph or any other graph database

[Question]

  1. Above understanding and analysis is correct ?
  2. What else you guys can suggest me to run distributed graph algorithms on cluster of nodes with their own GPU
ChuckHastings commented 1 year ago

I have not explored this problem space particularly well.

For our active current efforts we focus on dask for our python environment and mpi for our C++ environment. These frameworks provide a mechanism for launching resources on multiple GPUs and creating the necessary communication paths between the processes. But as you describe, neither of those sounds like a realistic option for you.

There were some experiments done a few years ago with Legion (https://legion.stanford.edu/overview/) that were very promising, but we have not had much opportunity to work with that environment since the experiments were run. On the surface it appears to be a reasonable analog for dask for C++ users. It appears to be actively supported (including by NVIDIA research). I am not aware of other alternatives, although there are probably some out there.

As you mention, you could also manage that yourself. What you require appears to be fairly minimal, so it's certainly an option. Although I generally advise against that as while what you want currently appears minimal, what you need later almost always grows into something where you'd be better off with a maintained product to help you.

Manoj-red-hat commented 1 year ago

Thanks a lot @ChuckHastings, for your explanation.

Now its strike Approach 3 :

image

Via arrow IPC CPU cluster can talk to GPU cluster

Pros:-

  1. Easy to setup
  2. No need of another resource manager
  3. Data & computation can be offloaded to GPU dask cluster
  4. GPU nodes can be independent of CPU nodes
  5. Plugin can work on any distributed graph database

Cons:-

  1. Data translation cost
  2. Ser/De cost minimised via arrow

@ChuckHastings @seunghwak What you guys think about the feasibility on this architecture ?

ChuckHastings commented 1 year ago

Have you looked at cugraph_service? @rlratzel can offer some thoughts here. We have a service we are building that would let processes connect to a graph server that can create your graph on the GPU, execute analytics and return results.

Barely fleshed out at this point, but issue #3107 might be relevant to you (there will be others, we're just starting to define some work that we've been discussing in this vein).

Manoj-red-hat commented 1 year ago

@ChuckHastings I already studied cugraph service, the problem is how to call that from multi threaded C++ or any other language

In my opinion cugraph service should be language independent like we need to extend cugraph service via rest or rpc.

If someone need to cache edges, he just need to call rest/cache edges than send paylod

than call rest api to create graph

If some need to call pagerank on that cached graph via rest/rpc that can also achieved

Its just a thought, we can discuss more if you find this idea worth

ChuckHastings commented 1 year ago

Agree with your observations. cugraph service is just beginning development and exploration. Ultimately the objective is to be callable from any language. The development effort is currently hyper-focused on one use case, but it will be expanded beyond that to support general server capabilities. Discussion like this will help us keep our focus on that big picture.