inducer / pycuda

CUDA integration for Python, plus shiny features
http://mathema.tician.de/software/pycuda
Other
1.85k stars 288 forks source link

[WIP] Add support for CUDA Graphs. #343

Open gfokkema opened 2 years ago

gfokkema commented 2 years ago

Hi there!

I wanted to experiment with CUDA Graphs a bit to get a feel for the performance differences between blocking, async and graph execution.

See:

However, while most required functionality is available (async, specifying stream, etc), pycuda does not have Graph support yet. This PR adds some initial support to launch a kernel pipeline using a CUgraph.

I'd love your comments and feedback, most likely I am not freeing memory correctly etc, let me know! All in all everything seems to be working enough to be useful already :)

Nice bonus is CUDA Graph API offers a function to output dot files, see picture below and the demo in examples/demo_graph.py. Note that the demo launches the kernel only once. Due to overhead, benefits of the Graph API should only really start showing when launching kernels repeatedly.

CUDA Graph

inducer commented 2 years ago

This looks great, thanks for working on this! To be merged, it'd of course need docs and tests. For lack of GPUs, I don't have usable CI for PyCUDA on Github, but I do have that on a Gitlab instance I run. Mind if I create a user account for you there?

cc @kaushikcfd

gfokkema commented 2 years ago

Hi, thanks for the feedback! Yes, this PR was meant primarily to pitch the idea and get some early feedback :)

And access to already usable CI would be great!

inducer commented 2 years ago

Made an account for you, you should have that info in your email. The site is at https://gitlab.tiker.net/inducer/pycuda.

mgaedtke commented 2 years ago

I did some experiments and tests with this and it seems to work without any errors so far. What would be the next steps to bring this to a future release?

inducer commented 2 years ago

It's clear that this should happen, ideally soon. As it happens, there are now two (draft) versions of this, one here:

https://gitlab.tiker.net/kaushikcfd/pycuda/-/merge_requests/2/diffs

and the other one in this PR. (They got started independently.) @mitkotak, could you comment on your plans with respect to upstreaming your work?

mitkotak commented 2 years ago

Thanks for your interest in this PR. Right now my estimate is to merge this feature into main in about a month. Most of the wrapper building is done. The purpose of my PR is to broaden the graph creation routes i.e exposing the finer-grained graph building routines in CUDAGraph API alongside the (begin|end)_capture approach. Right now I am handling regression failures, adding more tests and working on docs. Thanks !

YanBC commented 2 years ago

Hi there, any updates on the cuda graph feature?

mitkotak commented 2 years ago

Hi there, any updates on the cuda graph feature?

Thank you very much for the interest ! We are still testing the PR to make sure that we don't break any existing functionality but if you are curious to learn more then you can try it out using git clone https://gitlab.tiker.net/kaushikcfd/pycuda.git --branch cudagraph and then install it using pip install -e .. You can get comfortable with the syntax through examples/cudagraph_kernel.py and examples/cudagraph_streamcapture.py, and for the docs you can look for CUDAGraphs in doc/driver.rst. Thanks again for the interest and apologies for the delay !

mgaedtke commented 1 year ago

Hi @mitkotak, very much looking forward for this feature! Any idea, when the PR could be ready?