Closed robertknight closed 1 year ago
With a small tweak to the model conversion script to force the use of bilinear resizing instead of the unsupported cubic mode, this model actually works with examples/detr.rs
without any changes - ie. just specify the converted YOLOS model where the DETR model path would normally be specified:
diff --git a/tools/convert-onnx.py b/tools/convert-onnx.py
index ba64c40..7500b5f 100755
--- a/tools/convert-onnx.py
+++ b/tools/convert-onnx.py
@@ -719,7 +719,8 @@ def op_node_from_onnx_operator(
case "Resize":
attrs = sg.ResizeAttrsT()
- attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")
+ attrs.mode = sg.ResizeMode.Linear
+ # attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")
op_reader.check_attr("antialias", "int", 0)
The YOLOS model runs much more slowly than DETR though with the same image sizing settings (~6-7s after a change to parallelize softmax). The model does work and run much faster if the input image size is reduced to eg. min/max sizes of (480, 800), though I haven't tested how that affects accuracy.
diff --git a/examples/detr.rs b/examples/detr.rs
index b312871..2386f55 100644
--- a/examples/detr.rs
+++ b/examples/detr.rs
@@ -251,8 +251,8 @@ fn main() -> Result<(), Box<dyn Error>> {
image.insert_dim(0); // Add batch dim
// Resize image if it is not in the range of supported sizes.
- let min_size = 800;
- let max_size = 1333;
+ let min_size = 480;
+ let max_size = 800;
let (rescaled_width, rescaled_height) =
rescaled_size((image_width, image_height), min_size, max_size);
if rescaled_width != image_width || rescaled_height != image_height {
Model source: https://huggingface.co/hustvl/yolos-tiny Export to ONNX with:
Model inputs are 640x480 RGB images (standard COCO). Model outputs are:
logits
of shape[batch, box, class]
(whereclass
is one of the 92 COCO classes)pred_boxes
of shape[batch, box, coords]
wherecoords
appears to be[center_x, center_y, width, height]
Issues that need to be resolved to run this model:
[1, 1, 192]
to a target shape of[batch, 1, 1]
. The output shape should be[batch, 1, 192]
(https://github.com/robertknight/wasnn/pull/8)[10, 1, 1]
x[1, 1, 10]
=>[10, 1, 10]
(https://github.com/robertknight/wasnn/pull/8)scales
andsizes
instead of setting the input name to an empty value as per the ONNX spec.cubic
value formode
attr (Worked around for now by falling back to bilinear resizing with a warning in model conversion script.)floor
value fornearest_mode
attr, or ignore this attr ifmode
is notnearest
MatMul
handled the non-contiguous tensor. This could have been an issue with with my Slice changes, or a problem with MatMul and non-contiguous inputs (see https://github.com/robertknight/wasnn/commit/701e8334a76b6291e702ee1696ad4dd2c5453ea1)