robertknight / rten

ONNX neural network inference engine
123 stars 9 forks source link

Support YOLOS (tiny) #7

Closed robertknight closed 1 year ago

robertknight commented 1 year ago

Model source: https://huggingface.co/hustvl/yolos-tiny Export to ONNX with:

python -m transformers.onnx --model=hustvl/yolos-tiny models/yolos-tiny --feature object-detection

Model inputs are 640x480 RGB images (standard COCO). Model outputs are:

Issues that need to be resolved to run this model:

robertknight commented 1 year ago

With a small tweak to the model conversion script to force the use of bilinear resizing instead of the unsupported cubic mode, this model actually works with examples/detr.rs without any changes - ie. just specify the converted YOLOS model where the DETR model path would normally be specified:

diff --git a/tools/convert-onnx.py b/tools/convert-onnx.py
index ba64c40..7500b5f 100755
--- a/tools/convert-onnx.py
+++ b/tools/convert-onnx.py
@@ -719,7 +719,8 @@ def op_node_from_onnx_operator(

         case "Resize":
             attrs = sg.ResizeAttrsT()
-            attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")
+            attrs.mode = sg.ResizeMode.Linear
+            # attrs.mode = op_reader.get_enum_attr("mode", sg.ResizeMode, "nearest")

             op_reader.check_attr("antialias", "int", 0)

The YOLOS model runs much more slowly than DETR though with the same image sizing settings (~6-7s after a change to parallelize softmax). The model does work and run much faster if the input image size is reduced to eg. min/max sizes of (480, 800), though I haven't tested how that affects accuracy.

diff --git a/examples/detr.rs b/examples/detr.rs
index b312871..2386f55 100644
--- a/examples/detr.rs
+++ b/examples/detr.rs
@@ -251,8 +251,8 @@ fn main() -> Result<(), Box<dyn Error>> {
     image.insert_dim(0); // Add batch dim

     // Resize image if it is not in the range of supported sizes.
-    let min_size = 800;
-    let max_size = 1333;
+    let min_size = 480;
+    let max_size = 800;
     let (rescaled_width, rescaled_height) =
         rescaled_size((image_width, image_height), min_size, max_size);
     if rescaled_width != image_width || rescaled_height != image_height {