Closed Manoj-red-hat closed 1 year ago
@Manoj-red-hat I'm not sure I fully understand your question. We use a 2D partitioning scheme when loading data across multiple GPUs. But are you asking if there are other partitioning approaches, or are you wanting to get the partitions assignments?
@BradReesWork @ChuckHastings
Let me rephrase my question via distributed page-rank example
[Execution Environment Spark]
Single node (CPU + GPU) x 1 node Execution Flow Graph --> CPU (convert to COO format)(CPU) --> Cugraph (GPU) --> Return Page-rank (Everything Via UDF) On single node path is clear we can integrate and all works great
Issue is with Multi Node (CPU + GPU) x 4 node Execution Flow Graph --> CPU (convert to COO format) --> Partition Graph in 4 parts (GPU) --> assign each partitioned graph to each worker node(CPU )--> on each node call Cugraph (GPU) --> Return Page-rank (CPU) --> combine all (pagerank) (CPU )
[Approach 1]
[Approach 2]
In approach 1 & 2 both need some sort of GPU resource manager which can take care of coordination among GPUs residing on different nodes. But if we are able to do that it will inherently make cugraph a part of spark/tigergraph or any other graph database
[Question]
I have not explored this problem space particularly well.
For our active current efforts we focus on dask
for our python environment and mpi
for our C++ environment. These frameworks provide a mechanism for launching resources on multiple GPUs and creating the necessary communication paths between the processes. But as you describe, neither of those sounds like a realistic option for you.
There were some experiments done a few years ago with Legion (https://legion.stanford.edu/overview/) that were very promising, but we have not had much opportunity to work with that environment since the experiments were run. On the surface it appears to be a reasonable analog for dask
for C++ users. It appears to be actively supported (including by NVIDIA research). I am not aware of other alternatives, although there are probably some out there.
As you mention, you could also manage that yourself. What you require appears to be fairly minimal, so it's certainly an option. Although I generally advise against that as while what you want currently appears minimal, what you need later almost always grows into something where you'd be better off with a maintained product to help you.
Thanks a lot @ChuckHastings, for your explanation.
Now its strike Approach 3 :
Via arrow IPC CPU cluster can talk to GPU cluster
Pros:-
Cons:-
@ChuckHastings @seunghwak What you guys think about the feasibility on this architecture ?
Have you looked at cugraph_service
? @rlratzel can offer some thoughts here. We have a service we are building that would let processes connect to a graph server that can create your graph on the GPU, execute analytics and return results.
Barely fleshed out at this point, but issue #3107 might be relevant to you (there will be others, we're just starting to define some work that we've been discussing in this vein).
@ChuckHastings I already studied cugraph service, the problem is how to call that from multi threaded C++ or any other language
In my opinion cugraph service should be language independent like we need to extend cugraph service via rest or rpc.
If someone need to cache edges, he just need to call rest/cache edges than send paylod
than call rest api to create graph
If some need to call pagerank on that cached graph via rest/rpc that can also achieved
Its just a thought, we can discuss more if you find this idea worth
Agree with your observations. cugraph service
is just beginning development and exploration. Ultimately the objective is to be callable from any language. The development effort is currently hyper-focused on one use case, but it will be expanded beyond that to support general server capabilities. Discussion like this will help us keep our focus on that big picture.
What is your question?
Is there any way, by which we can partition a graph in GPU and use Multi GPU to commute result via libcugraph ?
Note: GPU can be on single node or multi-node
Code of Conduct