tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
https://burn.dev
Apache License 2.0
7.96k stars 383 forks source link

Deferred tensor allocations to avoid buffer sharing with conflicting visibility in WebGPU #1996

Open nathanielsimard opened 2 weeks ago

nathanielsimard commented 2 weeks ago

Currently, when calling client.empty(bytes), the tensor handle is allocated immediately. This can lead to situations where input handles and output handles share the same memory chunk during kernel execution, which violates WebGPU specifications.

Potential Solution

1) Introduce mapped and unmapped tensor handles. 2) Return an unmapped handle when calling client.empty(bytes). 3) Add a map method in the memory management trait, which acts like reserve but with an exclusion list. 4) Call that method in cube runtimes, with an exclusion list for WebGPU.

ArthurBrussee commented 2 weeks ago

Briefly looked at implenting this, some thoughts:

nathanielsimard commented 1 week ago
ArthurBrussee commented 1 week ago
let output = client.empty();
 // I imagine Backend::zeros() at some point resolves to a kernel. Or imagine some other "init" kernel.
client.execute(Kernel::zeros(), [output]);
client.execute(Kernel::MyCustom(), [a, b, c, output]);

You would need to guarantee the output doesn't re-use the buffer from a, b, c, but, it gets allocated in the first execute call where it seems there are no conflicts.

To be fully general you'd need the full execution graph pretty much.... that seems like a lot, but, this "init" case definitely occurs for me.