microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
359 stars 29 forks source link

[CUDA GRAPH] Support Cuda Stream in the Wrap Function #21

Closed LeiWang1999 closed 5 months ago

LeiWang1999 commented 5 months ago

As some frameworks might require enabling cuda graph for better performance, this pull request adds the cuda stream interface to the cuda wrap function in order to support cuda graph.