Have you considered a minimal-effort CPU runtime?

In your README, you mention wanting to build a JIT cranelift backend for the CPU.

I can see the appeal of such a backend, but at the same time, there are use-cases where users may really want a CPU runtime for their shaders and don't care that much about performance.

For instance, in Vello, we end up maintaining CPU pseudo-shaders in parallel of our actual WGSL shaders, mostly for testing and as a fallback. Personally, I'd like to push the fallback case even further so we can run Vello on machines without GPUs; in those cases, being able to run anything at all is a win, even with degraded performance. If we could achieve that and get rid of our duplicate CPU shaders, that would be a massive win for us.

Have you considered making a best-effort CPU runtime? One where annotated rust functions are simply lowered to regular rust functions, and you leave auto-vectorization to the rustc backend? How much effort do you think it would take to implement that runtime?

tracel-ai / cubecl

Have you considered a minimal-effort CPU runtime? #85