huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
14.7k stars 841 forks source link

How to Implement New Operators Using CUDA Host Functions Along with Thrust and CUB Libraries #2258

Open chenwanqq opened 1 month ago

chenwanqq commented 1 month ago

As stated, the CUDA code in the candle-kernels repository seems to only contain kernel functions. When I want to implement new operators (such as nonzero), it seems I'm only able to use Rust for higher-level functionality, which means I cannot utilize the device_vector from Thrust or the flagged APIs from CUB. This poses a significant challenge for implementing my algorithms. For example, to implement nonzero, it seems I would have to reimplement algorithms like exclusive_scan and scatter using the current approach?

I am hoping for a better way to utilize the CUDA ecosystem!

Specifically, I'm interested in how to:

  1. Incorporate host functions in CUDA code to facilitate the use of libraries like Thrust and CUB.
  2. Effectively leverage these libraries to implement algorithms and operators that are not natively supported in the current codebase. Any guidance or best practices for achieving this would be greatly appreciated. (Translate from Chinese using LLM, Might be a little bit.. formal^_^)
chenwanqq commented 1 month ago

I have finished a GPU version of nonzero candle-nonzero. It uses FFI to provoke CUDA functions. I'm still wondering what is the best way to integrate it to this project🧐