webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
393 stars 47 forks source link

[investigation] buffer sharing between GPU and ML accelerator #33

Closed huningxin closed 5 months ago

huningxin commented 5 years ago

For WebNN interoperability for custom op support, so far, we have done the investigation and report out for WebNN-WASM interop and WebNN-WebGPU interop.

According to the WebNN interop investigation next steps discussion in WebML CG call on 3 Oct, the participants were interested in the buffer sharing between GPU and ML accelerator. Opening this issue to capture the requirement as well as share the status and data.

The idea is that WebNN allows to run expensive ops (e.g. conv2d) on ML accelerator and share buffer to WebGPU compute shader to run custom ops (e.g. add/relu). It can be illustrated by following code sample.

// Create a WebNN model contains conv2d
const model = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await model.createCompilation();
// Let WebNN compilation for the ML accelerator
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input, output, bias are tf.tensor
// Get underlying WebGPUBuffer
const inputBuffer = tf.backend().getBuffer(input.dataId);
const outputBuffer = tf.backend().getBuffer(output.dataId);
// Set WebGPUBuffer as input and output to WebNN execution
execution.setInputGPUBuffer(0, inputBuffer);
execution.setOutputGPUBuffer(0, outputBuffer);
// Execute the WebNN ops on ML accelerator
execution.startCompute();
// Execute the WebGPU ops on GPU
let addOutput = tf.add(output, bias);
let reluOutput = tf.relu(addOutput);
// Read back result from GPU
let result = await reluOutput.data();

Per recommendation from @walrusmcd (thanks!), the investigation will initially target the AI on the PC Devkit. This device has both GPU and VPU (as an example of ML accelerator) that are supported D3D12 and DirectML API. The Chromium WebNN POC will be enhanced to support above scenario.

There are some dependencies need to be work on:

Currently, we have done the rebase and get basic VPU work in WebNN/DML backend. We'll update here once we make progress on the WebGPU-WebNN interop on D3D12/DML.

All, please kindly let me know whether I miss anything.

huningxin commented 4 years ago

Some updates:

  • Rebase WebNN POC to the version that WebGPU compute shader works on D3D12

Rebased WebNN POC to 80.0.3960.0 for WebGPU D3D12 support. There is an issue that TF.js WebGPU crashes due to lack of read-only storage buffer support. Workaround it by removing the read-only declaration of TF.js shader preprocessor.

  • Get WebGPU-WebNN interop work on D3D12/DML for GPU

Implemented WebGPU-WebNN interop on Windows with same API as macOS prototype. The WebGPU backend of D3D12 and WebNN backend on DirectML share buffers via D3D12Resource. The test results (with above workaround of TF.js WebGPU backend) are:

WebNN-WebGPU Interop Test
Start
TF.js sets backend as WebGPU
conv2d input dims: [1,100,100,100] and filter dims: [3,3,100,100]

Test1 - conv2d/add/relu (WebGPU): 37.93 ms
Test2 - conv2d (WebNN) -> ArrayBufferView -> add/relu (WebGPU): 27.04 ms
Test3 - conv2d (WebNN) -> WebGPUBuffer -> add/relu (WebGPU): 9.18 ms
Test4 - conv2d/add/relu (WebNN): 7.58 ms

The test platform configuration is

  • Get WebNN/DirectML backend work on VPU

Leveraged DXCore API to enumerate adapters that support compute-only devices, e.g. ML accelerator. When web app compiles WebNN graph with low-power preference, WebNN POC DML backend selects the low-power ML accelerator then creates D3D12/DML device and command queue for it. In particular, for our experiment on AI on PC devkit, the following sample code compiles and executes WebNN graph on VPU.

// Create a WebNN graph contains conv2d
const graph = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await graph.createCompilation();
// Compiles WebNN graph for VPU
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input and output are TypedArray
execution.setInput(0, input);
execution.setOutput(0, output);
// Executes WebNN graph on VPU
await execution.startCompute();

If the compilation preference is sustained-speed, WebNN DML backend still uses the GPU.

huningxin commented 4 years ago

Per the discussion of Dec 5 CG call, the next step of the investigation is to run Test3 on programmable ML accelerator, e.g. VPU. It means running custom ops in WebGPU computer shader and sharing buffer with WebNN built-in ops on ML accelerator. This depends on answer of following open questions:

For buffer sharing cross GPU and VPU:

  • Get WebGPU/D3D12/GPU and WebNN/DML/VPU interop work

As mentioned by @RafaelCintron in meeting, this usage is not recommended as it could be very slow. If the ML accelerator cannot do custom ops, web apps could still use ArrayBuffer.

a-sully commented 9 months ago

A solution to the problem of buffer-sharing is proposed in #482. Can we close this issue?

bbernhar commented 5 months ago

@huningxin PTAL at https://github.com/webmachinelearning/webnn/issues/688 and consider merging/closing this issue - we can track the latest interop proposal there.