Open vitiok123 opened 11 months ago
Can you share the specific onnx file you are using?
The error in general means that the output is missing somehow. If it is in the ONNX file and properly connected, there may be an issue in the optimizer.
Can you share the specific onnx file you are using?
The error in general means that the output is missing somehow. If it is in the ONNX file and properly connected, there may be an issue in the optimizer.
Hi, you can find the onnx file in this repository (file: yolov8m.onnx) https://github.com/AndreyGermanov/yolov8_onnx_python
I used python and yolov8 to export this file. When export it's possible to add some arguments. This is the list of arguments https://docs.ultralytics.com/modes/export/#arguments Maybe this will help to understand if the problem is in export settings.
Hm, the file linked does not have all its shapes inferred (nnx prepare
is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).
After simplifying with onnx-simplifier
(see README) there are still issues as the outputs of some Resize
nodes are not inferred yet:
[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape
[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape
The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as roi
is in this case):
This should however not pose an issue since the optimizer will move inputs to attributes for Resize
and in that process, ignore the optional roi
input.
So my suggestion would be to try again with the optimized version (obtained using python3 -m onnxsim ./model.onnx ./simplified.onnx
).
Hm, the file linked does not have all its shapes inferred (
nnx prepare
is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).After simplifying with
onnx-simplifier
(see README) there are still issues as the outputs of someResize
nodes are not inferred yet:[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape [2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape
The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as
roi
is in this case):This should however not pose an issue since the optimizer will move inputs to attributes for
Resize
and in that process, ignore the optionalroi
input.So my suggestion would be to try again with the optimized version (obtained using
python3 -m onnxsim ./model.onnx ./simplified.onnx
).
Wow, Cool, Thanks a lot. I will try and give you feedback.
Hm, the file linked does not have all its shapes inferred (
nnx prepare
is unable to infer all shapes, but that is expected as shape inference for Conv is not yet supported).After simplifying with
onnx-simplifier
(see README) there are still issues as the outputs of someResize
nodes are not inferred yet:[2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.10/Resize' input '' has unknown shape [2023-07-18T18:55:43Z ERROR nnx::info] Node '/model.13/Resize' input '' has unknown shape
The issue seems to be that this node has no name specified for one of its inputs (this is allowed for optional inputs, as
roi
is in this case):This should however not pose an issue since the optimizer will move inputs to attributes for
Resize
and in that process, ignore the optionalroi
input.So my suggestion would be to try again with the optimized version (obtained using
python3 -m onnxsim ./model.onnx ./simplified.onnx
).
After python3 -m onnxsim ./model.onnx ./simplified.onnx
this is the statistic
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add │ 15 │ 14 │
│ Concat │ 19 │ 19 │
│ Constant │ 189 │ 183 │
│ Conv │ 84 │ 84 │
│ Div │ 2 │ 1 │
│ Gather │ 1 │ 0 │
│ MaxPool │ 3 │ 3 │
│ Mul │ 80 │ 78 │
│ Reshape │ 5 │ 5 │
│ Resize │ 2 │ 2 │
│ Shape │ 1 │ 0 │
│ Sigmoid │ 78 │ 78 │
│ Slice │ 2 │ 2 │
│ Softmax │ 1 │ 1 │
│ Split │ 9 │ 9 │
│ Sub │ 2 │ 2 │
│ Transpose │ 1 │ 1 │
│ Model Size │ 99.0MiB │ 98.9MiB │
└────────────┴────────────────┴──────────────────┘
Using simplified model I get this in console:
Info logs
transferring input split for op Split to i64 attribute (initializer data type: I64): [48, 48]
applying padding optimization to tensor model.2.m.0.cv1.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.0.cv2.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.1.cv1.conv.weight: strides data is 82944 bytes before, 110592 bytes after
applying padding optimization to tensor model.2.m.1.cv2.conv.weight: strides data is 82944 bytes before, 110592 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [96, 96]
applying padding optimization to tensor model.4.m.0.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.0.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.1.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.1.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.2.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.2.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.3.cv1.conv.weight: strides data is 331776 bytes before, 442368 bytes after
applying padding optimization to tensor model.4.m.3.cv2.conv.weight: strides data is 331776 bytes before, 442368 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [192, 192]
applying padding optimization to tensor model.6.m.0.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.0.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.1.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.1.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.2.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.2.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.3.cv1.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
applying padding optimization to tensor model.6.m.3.cv2.conv.weight: strides data is 1327104 bytes before, 1769472 bytes after
transferring input split for op Split to i64 attribute (initializer data type: I64): [288, 288]
applying padding optimization to tensor model.8.m.0.cv1.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.0.cv2.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.1.cv1.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
applying padding optimization to tensor model.8.m.1.cv2.conv.weight: strides data is 2985984 bytes before, 3981312 bytes after
And after error
panicked at 'internal error: entered unreachable code', wonnx/src/optimizer.rs:95:67
Stack:
Error
at imports.wbg.__wbg_new_abda76e883ba8a5f (http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx.js?v=a126f01e:481:21)
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1080]:0x14a444
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2887]:0x18e73a
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1666]:0x17502e
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[1812]:0x17b84f
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2232]:0x187b4c
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2441]:0x18b798
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[2273]:0x188936
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[180]:0x1b83e
at http://localhost:3000/node_modules/@webonnx/wonnx-wasm/wonnx_bg.wasm:wasm-function[189]:0x34dbb
Uncaught (in promise) RuntimeError: unreachable
at wonnx_bg.wasm:0x175068
at wonnx_bg.wasm:0x17b84f
at wonnx_bg.wasm:0x187b4c
at wonnx_bg.wasm:0x18b798
at wonnx_bg.wasm:0x188936
at wonnx_bg.wasm:0x1b83e
at wonnx_bg.wasm:0x34dbb
at wonnx_bg.wasm:0xef85f
at wonnx_bg.wasm:0x3492e
at wonnx_bg.wasm:0xef85f
Good news and bad news:
The above does seem to be a bug in the optimizer, it appears to attempt constant folding on the missing node. I just committed https://github.com/webonnx/wonnx/commit/5d20e966473ad71fcdafba0bf5664a34f07f8a95 to fix that. Now unfortunately I get a different issue:
RUST_LOG=wonnx=debug RUST_BACKTRACE=1 cargo run --release -- infer ~/Downloads/yolov8m-simplified-2.onnx
[2023-07-18T19:50:08Z DEBUG wonnx::gpu] sequence tensor onnx::Split_180 (outputs readable=false)
[2023-07-18T19:50:08Z WARN wonnx::gpu] initializers with int64 data type are not supported, converting into int32 initializer
[2023-07-18T19:50:08Z INFO wonnx::gpu] creating buffer: onnx::Split_180 8b
[2023-07-18T19:50:08Z DEBUG wonnx::gpu] sequence op: /model.2/Split_output_0 (Split) (outputs readable=false)
thread 'main' panicked at 'wgpu error: Validation Error
Caused by:
In Device::create_bind_group
note: label = `/model.2/Split_output_0`
Number of bindings in bind group descriptor (4) does not match the number of bindings defined in the bind group layout (3)
It does appear the split
input (number 2) is properly transferred to an attribute:
[2023-07-18T19:55:15Z DEBUG wonnx::optimizer] locally_optimized_node_with NodeIdentifier(0x600001c81b40, "/model.2/Split_output_0") op: /model.2/Split (Split)
[2023-07-18T19:55:15Z INFO wonnx::optimizer] transferring input split for op Split to i64 attribute (initializer data type: I64): [48, 48]
So for some reason it thinks there should be four buffers in one place but three in another. In the generated shader code, it has three (as expected: the split
input is moved to an attribute earlier):
@group(0) @binding(0)
var<storage, read> input_0: Array;
@group(0) @binding(1)
var<storage, read_write> output_0: Array;
@group(0) @binding(2)
var<storage, read_write> output_1: Array;
Hence, there must still be two inputs in the IR (even after split
is moved to an attribute) while only one is ever used by the shader (it expects all other input to be moved to attributes), which leads to the error.
This needs some further investigation (I don't have the time for it now) but at least we know where to look.
Cool, good to know about this. No problem, when will be done, will be done :)
Thank you a lot for your super fast help and answer.
Hi @pixelspark, I've encountered the same error trying to run YOLOv8 via wonnx. Have you had a chance to look into this issue yet?
If you don't have time for that, but could offer some guidance in debugging, that would be very much appreciated too :)
Hi @pixelspark, I've encountered the same error trying to run YOLOv8 via wonnx. Have you had a chance to look into this issue yet?
If you don't have time for that, but could offer some guidance in debugging, that would be very much appreciated too :)
I haven't (and frankly don't have the time), unfortunately.
If I were you I would start by investigating whether your ONNX file too has the issue with the Split operator, and check how many inputs it has. You might be able to rewrite (using Python onnx package) the ONNX file to something wonnx accepts. Another possibility would be to tweak the ONNX opset version (perhaps the issue is caused because there are different forms Split can take depending on the opset version).
Describe the bug A clear and concise description of what the bug is. Hi, I try to use your example (https://github.com/webonnx/wonnx-wasm-example). When I try to use yolov8m.onnx (a coco datset exported with YOLOv8 to onnx), I get this error
SessionError 'IR error: output node for output /model.0/conv/Conv_output_0 not found'
To Reproduce Steps to reproduce the behavior:
Expected behavior To not have error
Screenshots
Desktop (please complete the following information):