tracel-ai / cubecl

Multi-platform high-performance compute language extension for Rust.
https://burn.dev
Apache License 2.0
648 stars 30 forks source link

Have you considered a minimal-effort CPU runtime? #85

Open PoignardAzur opened 2 months ago

PoignardAzur commented 2 months ago

In your README, you mention wanting to build a JIT cranelift backend for the CPU.

I can see the appeal of such a backend, but at the same time, there are use-cases where users may really want a CPU runtime for their shaders and don't care that much about performance.

For instance, in Vello, we end up maintaining CPU pseudo-shaders in parallel of our actual WGSL shaders, mostly for testing and as a fallback. Personally, I'd like to push the fallback case even further so we can run Vello on machines without GPUs; in those cases, being able to run anything at all is a win, even with degraded performance. If we could achieve that and get rid of our duplicate CPU shaders, that would be a massive win for us.

Have you considered making a best-effort CPU runtime? One where annotated rust functions are simply lowered to regular rust functions, and you leave auto-vectorization to the rustc backend? How much effort do you think it would take to implement that runtime?

nathanielsimard commented 2 months ago

Making a low-effort CPU runtime would probably be as hard as making a proper CPU runtime. To speed things up, we might generalize our CUDA compiler to a C++ compiler and compile it using gcc or llvm. The compiler wouldn't be embedded, but it would be faster to develop.