Support YOLOS (tiny) - Githubissues

Model source: https://huggingface.co/hustvl/yolos-tiny Export to ONNX with:

python -m transformers.onnx --model=hustvl/yolos-tiny models/yolos-tiny --feature object-detection

Model inputs are 640x480 RGB images (standard COCO). Model outputs are:

logits of shape [batch, box, class] (where class is one of the 92 COCO classes)
pred_boxes of shape [batch, box, coords] where coords appears to be [center_x, center_y, width, height]

Issues that need to be resolved to run this model:

[x] Expand op: Allow target shape to be smaller than input. The model broadcasts a constant of shape [1, 1, 192] to a target shape of [batch, 1, 1]. The output shape should be [batch, 1, 192] (https://github.com/robertknight/wasnn/pull/8)
[x] All binary ops: Binary ops currently pick the shapes of one of the two inputs as a broadcast target shape. However when broadcasting numpy / PyTorch will pick, for each dimension, the size that is not 1. eg. [10, 1, 1] x [1, 1, 10] => [10, 1, 10] (https://github.com/robertknight/wasnn/pull/8)
[x] Div op: Support int tensors (4feafca1f31f613cf8697df56fb3245032132b08)
[x] Resize op: The exported model uses empty tensors to represent missing optional inputs for scales and sizes instead of setting the input name to an empty value as per the ONNX spec.
[x] Resize op: Support cubic value for mode attr (Worked around for now by falling back to bilinear resizing with a warning in model conversion script.)
[x] Resize op: Support floor value for nearest_mode attr, or ignore this attr if mode is not nearest
[x] Slice op: Support negative starts/ends for in-place slicing, or fall back to copying
[x] MatMul op: With a quickly hacked-together version of supporting negative starts/ends for in-place slicing I ran into an error when MatMul handled the non-contiguous tensor. This could have been an issue with with my Slice changes, or a problem with MatMul and non-contiguous inputs (see https://github.com/robertknight/wasnn/commit/701e8334a76b6291e702ee1696ad4dd2c5453ea1)
[x] Build a test app and verify that it produces sensible results with this model (The existing DETR example works with this model out of the box, although I did add the ability to shrink the inputs to get faster inference)

With a small tweak to the model conversion script to force the use of bilinear resizing instead of the unsupported cubic mode, this model actually works with examples/detr.rs without any changes - ie. just specify the converted YOLOS model where the DETR model path would normally be specified:

diff --git a/tools/convert-onnx.py b/tools/convert-onnx.py
index ba64c40..7500b5f 100755
--- a/tools/convert-onnx.py
+++ b/tools/convert-onnx.py
@@ -719,7 +719,8 @@ def op_node_from_onnx_operator(

         case "Resize":
             attrs = sg.ResizeAttrsT()
-            attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")
+            attrs.mode = sg.ResizeMode.Linear
+            # attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")

             op_reader.check_attr("antialias", "int", 0)

The YOLOS model runs much more slowly than DETR though with the same image sizing settings (~6-7s after a change to parallelize softmax). The model does work and run much faster if the input image size is reduced to eg. min/max sizes of (480, 800), though I haven't tested how that affects accuracy.

diff --git a/examples/detr.rs b/examples/detr.rs
index b312871..2386f55 100644
--- a/examples/detr.rs
+++ b/examples/detr.rs
@@ -251,8 +251,8 @@ fn main() -> Result<(), Box<dyn Error>> {
     image.insert_dim(0); // Add batch dim

     // Resize image if it is not in the range of supported sizes.
-    let min_size = 800;
-    let max_size = 1333;
+    let min_size = 480;
+    let max_size = 800;
     let (rescaled_width, rescaled_height) =
         rescaled_size((image_width, image_height), min_size, max_size);
     if rescaled_width != image_width || rescaled_height != image_height {

robertknight / rten

Support YOLOS (tiny) #7