Add support for inference on ONNX and TensorFlow models

hotg-ai / rune

Rune provides containers to encapsulate and deploy edgeML pipelines and applications

Apache License 2.0

133 stars 15 forks source link

I think almost everyone on the HOTG team has expressed a desire to use more ML frameworks at some point, in particular ONNX and Tensor Flow. However, I was reluctant to use bindings that go through their official C++ implementations after seeing how much trouble we had integrating TensorFlow Lite.

When I was playing around with hotg-ai/wasi-nn-experiment I came across a pure Rust implementation TensorFlow and ONNX inference called tract. This was able to cross-compile to aarch64-linux-android and wasm32-unknown-unknown without any extra work.

By using tract instead of the reference implementations we'll be giving up some performance, reliability, and features (e.g. missing model ops) in exchange for long term maintainability and reduced build complexity. @f0rodo may want to comment on this trade-off, but from an engineering perspective I think it's worth it.

The things we'll need to support new model types:

[x] Add an args field to models inside the Runefile (done)
[x] Let the user provide a format argument which is either "tensorflow-lite", "tensorflow", or "onnx" to specify what type of model this is (default is "tensorflow-lite" if not provided) (example)
[x] Convert the format into a mimetype that gets embedded in the Rune and passed to the runtime when loading a model (conversion, injecting into the generated Rune)
[ ] Create new ModelFactory implementations for handling TensorFlow and ONNX models
[ ] Register the new TensorFlow and ONNX model factories as part of our BaseImage::with_defaults() (maybe hide them behind a feature flag like we did with "tensorflow-lite" so users can cut down on dependencies, it's up to you)
[ ] Add integration tests that try to compile and run Runes with TensorFlow and ONNX models

@Michael-F-Bryan long term maintainability will be more problematic though. tract does NOT implement all the operators that tf-lite / ONNX provides. Even ONNX support is not 100%, and this is a moving target. So whenever a user's model doesn't work, we get the bug reports (and maintainability burden) instead of upstream tensorflow/onnx. Tract's statement on tensorflow 2 support is basically:

Addiotionaly, the complexity of TensorFlow 2 make it very unlikely that a direct support will ever exist in tract. Many TensorFlow 2 nets can be converted to ONNX and loaded in tract.

So we'd be going backwards on the actual user facing features (tf 1.0, not complete onnx feature set etc..)that we support this way.

That being said, tract could make a good starting point for us to try out wasi-nn. Especially if we want to target microcontroller world (librunecoral is a no-go for that). Eventually I'd like even librunecoral to support wasi-nn, but let's see how much time / resources we can allocate for that. We still have to kill the old C++ based RuneVM.

Personally, as long as we get zero copy pipelines, and be able to use appropriate hardware acceleration for the use cases that we wish support (eg. if we want to use rune for some kind of video processing - we need tpu/gpu acceleration there, but for just text/audio based models, we can get away without using hardware acceleration), we can get away with any framework.

hotg-ai / rune

Add support for inference on ONNX and TensorFlow models #370