Open gfokkema opened 2 years ago
This looks great, thanks for working on this! To be merged, it'd of course need docs and tests. For lack of GPUs, I don't have usable CI for PyCUDA on Github, but I do have that on a Gitlab instance I run. Mind if I create a user account for you there?
cc @kaushikcfd
Hi, thanks for the feedback! Yes, this PR was meant primarily to pitch the idea and get some early feedback :)
And access to already usable CI would be great!
Made an account for you, you should have that info in your email. The site is at https://gitlab.tiker.net/inducer/pycuda.
I did some experiments and tests with this and it seems to work without any errors so far. What would be the next steps to bring this to a future release?
It's clear that this should happen, ideally soon. As it happens, there are now two (draft) versions of this, one here:
https://gitlab.tiker.net/kaushikcfd/pycuda/-/merge_requests/2/diffs
and the other one in this PR. (They got started independently.) @mitkotak, could you comment on your plans with respect to upstreaming your work?
Thanks for your interest in this PR. Right now my estimate is to merge this feature into main
in about a month. Most of the wrapper building is done. The purpose of my PR is to broaden the graph creation routes i.e exposing the finer-grained graph building routines in CUDAGraph API alongside the (begin|end)_capture
approach. Right now I am handling regression failures, adding more tests and working on docs. Thanks !
Hi there, any updates on the cuda graph feature?
Hi there, any updates on the cuda graph feature?
Thank you very much for the interest ! We are still testing the PR to make sure that we don't break any existing functionality but if you are curious to learn more then you can try it out using git clone https://gitlab.tiker.net/kaushikcfd/pycuda.git --branch cudagraph
and then install it using pip install -e .
. You can get comfortable with the syntax through examples/cudagraph_kernel.py
and examples/cudagraph_streamcapture.py
, and for the docs you can look for CUDAGraphs
in doc/driver.rst
. Thanks again for the interest and apologies for the delay !
Hi @mitkotak, very much looking forward for this feature! Any idea, when the PR could be ready?
Hi there!
I wanted to experiment with CUDA Graphs a bit to get a feel for the performance differences between blocking, async and graph execution.
See:
However, while most required functionality is available (async, specifying stream, etc), pycuda does not have Graph support yet. This PR adds some initial support to launch a kernel pipeline using a CUgraph.
I'd love your comments and feedback, most likely I am not freeing memory correctly etc, let me know! All in all everything seems to be working enough to be useful already :)
Nice bonus is CUDA Graph API offers a function to output dot files, see picture below and the demo in
examples/demo_graph.py
. Note that the demo launches the kernel only once. Due to overhead, benefits of the Graph API should only really start showing when launching kernels repeatedly.