Open SimonBrandner opened 1 month ago
As you correctly pointed out the issue is with the gather node.
We check for the inputs to make sure they are in accordance with the spec, but when checking for the indices we expect a tensor but your model uses a scalar (rank 0 tensor) - which is still valid according to the spec but currently will run into issues with the rest of the codegen for Burn (see issue #1689).
Also I just realized your model was generated with an older opset (v9) and we might not support all of the ops there. Right now the ops we have should support the latest ops at opset 13. Some also have support for older opset but not all (e.g., Slice
which I see in your model is opset v1 and we don't parse the attributes).
The ONNX spec is large and our bandwidth for that is a bit limited.
We should probably fix this to support older opset.
As a work around, can you convert to a new opset? There are some tools to upgrade the onnx file.
I tried this and it did not help... That said from the comments my understanding is that the Only tensor indices is valid
issue isn't due to an old opset, is it?
This dirty fix seems to be working:
diff --git a/crates/burn-import/src/onnx/dim_inference.rs b/crates/burn-import/src/onnx/dim_inference.rs
index 94d66d22..c404829c 100644
--- a/crates/burn-import/src/onnx/dim_inference.rs
+++ b/crates/burn-import/src/onnx/dim_inference.rs
@@ -808,11 +808,16 @@ fn gather_update_outputs(node: &mut Node) {
};
let indices_tensor = match &node.inputs[1].ty {
- ArgType::Tensor(tensor) => tensor,
- _ => panic!("Only tensor indices is valid"),
+ ArgType::Tensor(tensor) => tensor.clone(),
+ ArgType::Scalar(scalar) => TensorType {
+ elem_type: scalar.clone(),
+ dim: 0,
+ shape: None,
+ },
+ _ => panic!("Only tensor or scalar indices are valid"),
};
- if indices_tensor.dim != 1 {
+ if indices_tensor.dim > 1 {
panic!("Gather: indices tensor rank above 1 not supported")
}
but then I run into this:
ERROR burn_import::logger: PANIC => panicked at /home/simon/GIT/Rust/burn/crates/burn-import/src/onnx/op_configuration.rs:302:9:
Flatten: input tensor must have at least 2 dimensions (got 0)
(btw, is there a Discord server/Matrix space/whatever which would be more suitable for debugging this?)
As far as I can tell, according to the docs, the input tensor really must hast at least 2 dimensions which makes me ask how is it I am getting a 0D one...
(btw, is there a Discord server/Matrix space/whatever which would be more suitable for debugging this?)
Absolutely! Come join us here.
I tried this and it did not help... That said from the comments my understanding is that the
Only tensor indices is valid
issue isn't due to an old opset, is it?
Yeah it's not just because of the opset. Right now the first errors you're seeing are due to the rank 0 tensor. When I checked your model last time the gather node used a single value to gather and then it was unsqueezed to a 1-dim tensor.
We currently don't support 0-dim tensors, so even if we try to work around it by returning a 1d tensor I don't think the rest of the codegen will match. We would have to add support for that.
Finally, you do have some ops that use an older opset which may not be supported (as pointed out in my last comment, some attributes which don't exist in new opsets are not parsed).
This part needs to be fixed:
We need to support Gather, Unsqueeze, and Concat operations on Shape
Used onnx-simplifier
(https://github.com/daquexian/onnx-simplifier) which removes lots of these crazy gather reshape operation, which were due to lacking capabilities of the older ONNX OpSet versions.
Here is updated version:
Output of the tool:
[burn]% onnxsim ~/Downloads/face_detector_640.onnx ./face_detector_opt.onnx
Simplifying...
Finish! Here is the difference:
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add │ 3 │ 3 │
│ BatchNormalization │ 35 │ 0 │
│ Concat │ 13 │ 5 │
│ Constant │ 240 │ 111 │
│ Conv │ 52 │ 52 │
│ Div │ 2 │ 1 │
│ Exp │ 1 │ 1 │
│ Gather │ 8 │ 0 │
│ Mul │ 4 │ 4 │
│ Relu │ 37 │ 37 │
│ Reshape │ 8 │ 8 │
│ Shape │ 8 │ 0 │
│ Slice │ 6 │ 4 │
│ Softmax │ 1 │ 1 │
│ Sub │ 1 │ 1 │
│ Transpose │ 8 │ 8 │
│ Unsqueeze │ 24 │ 0 │
│ Model Size │ 1.5MiB │ 1.3MiB │
└────────────────────┴────────────────┴──────────────────┘
Now, there is still a slight issue with Slice operation which does not support attributes. I am working on this fix.
Used
onnx-simplifier
(https://github.com/daquexian/onnx-simplifier) which removes lots of these crazy gather reshape operation, which were due to lacking capabilities of the older ONNX OpSet versions.Here is updated version:
Output of the tool:
[burn]% onnxsim ~/Downloads/face_detector_640.onnx ./face_detector_opt.onnx Simplifying... Finish! Here is the difference: ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ ┃ Original Model ┃ Simplified Model ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ Add │ 3 │ 3 │ │ BatchNormalization │ 35 │ 0 │ │ Concat │ 13 │ 5 │ │ Constant │ 240 │ 111 │ │ Conv │ 52 │ 52 │ │ Div │ 2 │ 1 │ │ Exp │ 1 │ 1 │ │ Gather │ 8 │ 0 │ │ Mul │ 4 │ 4 │ │ Relu │ 37 │ 37 │ │ Reshape │ 8 │ 8 │ │ Shape │ 8 │ 0 │ │ Slice │ 6 │ 4 │ │ Softmax │ 1 │ 1 │ │ Sub │ 1 │ 1 │ │ Transpose │ 8 │ 8 │ │ Unsqueeze │ 24 │ 0 │ │ Model Size │ 1.5MiB │ 1.3MiB │ └────────────────────┴────────────────┴──────────────────┘
Now, there is still a slight issue with Slice operation which does not support attributes. I am working on this fix.
![]()
Ha, that's what I suggested on discord. Glad it cleared up the redundant stuff 🙂
Submitted a PR for the slice issue: https://github.com/tracel-ai/burn/pull/1989
But I stumbled another bug: https://github.com/tracel-ai/burn/issues/1983 which I am working to fix.
Overall, I think I am very close to fixing it. The good news is that all ops seems to be working.
I got the face-onnx building, however, there is a run-time error that's probably introduced recently when we were refactoring tensors. @laggui did some work in the Tensor random initialization. Maybe he knows what's going on.
I am attaching all my artifacts here in my repository, which will be hosted temporarily:
https://github.com/antimora/face-onnx
Please continue debugging. I think the bug is in the Burn's core.
I got the face-onnx building, however, there is a run-time error that's probably introduced recently when we were refactoring tensors. @laggui did some work in the Tensor random initialization. Maybe he knows what's going on.
I am attaching all my artifacts here in my repository, which will be hosted temporarily:
https://github.com/antimora/face-onnx
Please continue debugging. I think the bug is in the Burn's core.
I have filed the model initialization bugs here: https://github.com/tracel-ai/burn/issues/2043
Hi @SimonBrandner,
We have a working ONNX! Thanks to @laggui for fixing the latest bugs.
Please check out: https://github.com/antimora/face-onnx
Your original ONNX file went through a series cleanups using python scripts. We will have more permanent fix in upcoming #2008 PR. For now this should unblock you.
Please onnx inference example (https://github.com/tracel-ai/burn/tree/main/examples/onnx-inference and https://github.com/tracel-ai/models/tree/main/mobilenetv2-burn) on how to embed your weights or specify location on your weights file. In my example, it's generated into target directory.
Describe the bug
Error during importing the detector model:
To Reproduce
src/model/detector.onnx
build.rs
What I've figured out so far
Attachment: face_detector_640.onnx.zip