microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.46k stars 2.75k forks source link

[WebGL] cannot resolve operator 'DynamicQuantizeLinear' with opsets: ai.onnx v16, ... #13800

Open josephrocca opened 1 year ago

josephrocca commented 1 year ago

Describe the issue

When using the WebGL backend with this model, I get the following error:

cannot resolve operator 'DynamicQuantizeLinear' with opsets: ai.onnx v16, com.microsoft.experimental v1, ai.onnx.preview.training v1, ai.onnx.training v1, com.ms.internal.nhwc v17, org.pytorch.aten v1, com.microsoft.nchwc v1, ai.onnx.ml v3, com.microsoft v1

Note that I had to use opset 16 because PyTorch ONNX export doesn't support 17. Note also, that the Wasm back-end works fine, as usual. I'm not sure how committed the team is to improving WebGL op support, but I'll just note that it's currently pretty rare that I'm able to get the WebGL backend working due to lack of op support.

Perhaps the team is waiting for WebGPU to land in browsers (probably early next year?) to put more effort into GPU inference on the web? I'm hoping that with WebGPU, the ONNX Runtime team will be able to "automatically" port their native GPU kernels to WGSL, just like they port the native CPU kernels to Wasm. IIUC, WGSL is specced with ease-of-porting/transpilation from native GPU formats in mind? If a manual rewrite for all the WebGPU kernels is required, then I'm worried that the WebGPU backend will forever have patchy support compared to the wasm backend.

To reproduce

https://jsbin.com/daginihoho/edit?html,output

Urgency

No hard deadlines.

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

Execution Provider

WebGL

SangbumChoi commented 1 year ago

opset.ts:47 Uncaught (in promise) TypeError: cannot resolve operator 'DynamicQuantizeLinear' with opsets: ai.onnx v13 at t.resolveOperator (opset.ts:47:1) at t.WebGLSessionHandler.resolve (session-handler.ts:81:1) at t.Session.initializeOps (session.ts:242:1) at session.ts:93:1 at t.Profiler.event (instrument.ts:337:1) at t.Session.initialize (session.ts:89:1) at session.ts:71:1

Facing same issue with above.

SangbumChoi commented 1 year ago

@josephrocca It says all the quantization layer in onnxruntime do not support with WebGL backend. Did you solve this problem? I think we may self-generate our own operation in low-level language or ts/js

josephrocca commented 1 year ago

@SangbumChoi I didn't solve it - I'm wondering if it's possible to "dequantize" it on the client. The main reason I want the model quantized is to reduce the time it takes for the client to download it.

But the problem is the WebGL backend is just missing a lot of ops compared to Wasm backend. The unquantized version of my model also has Erf - just like yours, apparently, which the WebGL backend doesn't support.

Like I said in my first comment on this issue, I hope that WebGPU will solve these compatibility problems by just compiling the native GPU code to WGSL, so we have ~full op support without lots of burden on the ONNX Runtime Web team.