Open ezhulenev opened 1 year ago
What is the status here? Can you confirm it is done for cublas, and cudnn isn't done yet?
It's work in progress. cuBLAS was done for the current runtime integration based on whole XLA program capture, this is for capturing library calls into child graphs
We will use cuGraphAddChildGraphNode API to insert CUDA graphs captured from library calls into "main" command buffers.
We need a not too CUDA graph specific APIs in StreamExecutor/CommandBuffer to make it potentially portable for other command-buffer-like APIs.