Closed huningxin closed 5 months ago
Some updates:
- Rebase WebNN POC to the version that WebGPU compute shader works on D3D12
Rebased WebNN POC to 80.0.3960.0 for WebGPU D3D12 support. There is an issue that TF.js WebGPU crashes due to lack of read-only storage buffer support. Workaround it by removing the read-only
declaration of TF.js shader preprocessor.
- Get WebGPU-WebNN interop work on D3D12/DML for GPU
Implemented WebGPU-WebNN interop on Windows with same API as macOS prototype. The WebGPU backend of D3D12 and WebNN backend on DirectML share buffers via D3D12Resource. The test results (with above workaround of TF.js WebGPU backend) are:
WebNN-WebGPU Interop Test
Start
TF.js sets backend as WebGPU
conv2d input dims: [1,100,100,100] and filter dims: [3,3,100,100]
Test1 - conv2d/add/relu (WebGPU): 37.93 ms
Test2 - conv2d (WebNN) -> ArrayBufferView -> add/relu (WebGPU): 27.04 ms
Test3 - conv2d (WebNN) -> WebGPUBuffer -> add/relu (WebGPU): 9.18 ms
Test4 - conv2d/add/relu (WebNN): 7.58 ms
The test platform configuration is
- Get WebNN/DirectML backend work on VPU
Leveraged DXCore API to enumerate adapters that support compute-only devices, e.g. ML accelerator. When web app compiles WebNN graph with low-power
preference, WebNN POC DML backend selects the low-power ML accelerator then creates D3D12/DML device and command queue for it. In particular, for our experiment on AI on PC devkit, the following sample code compiles and executes WebNN graph on VPU.
// Create a WebNN graph contains conv2d
const graph = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await graph.createCompilation();
// Compiles WebNN graph for VPU
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input and output are TypedArray
execution.setInput(0, input);
execution.setOutput(0, output);
// Executes WebNN graph on VPU
await execution.startCompute();
If the compilation preference is sustained-speed
, WebNN DML backend still uses the GPU.
Per the discussion of Dec 5 CG call, the next step of the investigation is to run Test3
on programmable ML accelerator, e.g. VPU. It means running custom ops in WebGPU computer shader and sharing buffer with WebNN built-in ops on ML accelerator. This depends on answer of following open questions:
For buffer sharing cross GPU and VPU:
- Get WebGPU/D3D12/GPU and WebNN/DML/VPU interop work
As mentioned by @RafaelCintron in meeting, this usage is not recommended as it could be very slow. If the ML accelerator cannot do custom ops, web apps could still use ArrayBuffer
.
A solution to the problem of buffer-sharing is proposed in #482. Can we close this issue?
@huningxin PTAL at https://github.com/webmachinelearning/webnn/issues/688 and consider merging/closing this issue - we can track the latest interop proposal there.
For WebNN interoperability for custom op support, so far, we have done the investigation and report out for WebNN-WASM interop and WebNN-WebGPU interop.
According to the WebNN interop investigation next steps discussion in WebML CG call on 3 Oct, the participants were interested in the buffer sharing between GPU and ML accelerator. Opening this issue to capture the requirement as well as share the status and data.
The idea is that WebNN allows to run expensive ops (e.g. conv2d) on ML accelerator and share buffer to WebGPU compute shader to run custom ops (e.g. add/relu). It can be illustrated by following code sample.
Per recommendation from @walrusmcd (thanks!), the investigation will initially target the AI on the PC Devkit. This device has both GPU and VPU (as an example of ML accelerator) that are supported D3D12 and DirectML API. The Chromium WebNN POC will be enhanced to support above scenario.
There are some dependencies need to be work on:
Currently, we have done the rebase and get basic VPU work in WebNN/DML backend. We'll update here once we make progress on the WebGPU-WebNN interop on D3D12/DML.
All, please kindly let me know whether I miss anything.