sonos / tract

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Other
2.24k stars 214 forks source link

Crash in NNEF export when using Symbol in input fact shape #710

Closed tgolsson closed 2 years ago

tgolsson commented 2 years ago

Hello!

I'm trying to convert some of our stack to use NNEF as a shipped format instead of raw ONNX (for performance reasons at load time based on #393); but I'm running into a slight issue with batching. My expectation was that if I'm running .to_typed() I have to set input facts, and if I want to use dynamic batching that is my point to set a Symbol as the dim. However; this causes the NNEF serialization to crash on an unwrap -- see callstack below.

Is the intention that I'll fix my batch size (e.g. = 1) before NNEF conversion and then update it with symbols again after? Am I doing too much work on the model before NNEF conversion? It seems to me like I have to to all the way to a TypedModel before I can do the conversion.

These are my input shapes:

input0: [Sym(Symbol('N', 0)), Val(280)]
input1: [Sym(Symbol('N', 0)), Val(30), Val(30), Val(2)]
input2: [Sym(Symbol('N', 0)), Val(24)]

Callstack:

thread 'test_to_nnef_complex' panicked at 'called `Option::unwrap()` on a `None` value', <snip>/tract-nnef-0.16.6/src/ops/nnef/ser.rs:16:60
stack backtrace:
...
   3: core::option::Option<T>::unwrap
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/option.rs:746:21
   4: tract_nnef::ops::nnef::ser::source
             at <snip>/tract-nnef-0.16.6/src/ops/nnef/ser.rs:16:32
   5: tract_nnef::ops::nnef::tract_nnef::{{closure}}
             at <snip>/tract-nnef-0.16.6/src/ops/nnef/mod.rs:12:17
   6: core::ops::function::FnOnce::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:227:5
   7: tract_nnef::registry::Registry::serialize
             at <snip>/tract-nnef-0.16.6/src/registry.rs:145:35
   8: tract_nnef::ser::IntoAst::node
             at <snip>/tract-nnef-0.16.6/src/ser.rs:216:36
   9: tract_nnef::ser::IntoAst::translate
             at <snip>/tract-nnef-0.16.6/src/ser.rs:96:13
  10: tract_nnef::ser::to_proto_model
             at <snip>/tract-nnef-0.16.6/src/ser.rs:7:5
  11: tract_nnef::framework::Nnef::write_to_tar
             at <snip>/tract-nnef-0.16.6/src/framework.rs:46:27
  12: tract_nnef::framework::Nnef::write
             at <snip>/tract-nnef-0.16.6/src/framework.rs:41:9
  13: <snip>::to_nnef
             at ./src/lib.rs:72:5
  14: to_nnef::test_to_nnef_complex
             at ./tests/to-nnef.rs:19:18
  15: to_nnef::test_to_nnef_complex::{{closure}}
             at ./tests/to-nnef.rs:13:1
  16: core::ops::function::FnOnce::call_once
kali commented 2 years ago

Hey, thanks for the report. As you may have noticed, symbols in NNEF/tract-opl are brand new. We ourselves have about half a model working with it at this point, so it's not a big surprise you run into issues.

The NNEF/tract-opl dumping is only meant to work after decluttering, so on a TypedModel, be fore calling "codegen", and independently of a TypedPlan.

I can actually reproduce your issue with:

 use tract_nnef::internal::*;

  fn main() -> TractResult<()> {
      let n = Symbol::new('n');
      let mut model = TypedModel::default();
      let source = model.add_source("foo", TypedFact::dt_shape(DatumType::F32, &[n]))?;
      model.set_output_outlets(&[source])?;
      model.declutter()?;

      tract_nnef::nnef().write_to_dir(&model, "foo")?;

      Ok(())
  }

I'll be looking into a fix.

tgolsson commented 2 years ago

Thank you @kali! Yeah, I meant TypedModel, not TypedPlan.

kali commented 2 years ago

I think I have fixed it, and the fixed is merged on main. Would be nice if you can confirm.

tgolsson commented 2 years ago

@kali It now gets past the export stage; thank you! Will continue working on loading/inferring and see if more comes up there. Thanks for the quick fix.