hotg-ai / rune

Rune provides containers to encapsulate and deploy edgeML pipelines and applications
Apache License 2.0
133 stars 15 forks source link

Add support for inference on ONNX and TensorFlow models #370

Open Michael-F-Bryan opened 2 years ago

Michael-F-Bryan commented 2 years ago

I think almost everyone on the HOTG team has expressed a desire to use more ML frameworks at some point, in particular ONNX and Tensor Flow. However, I was reluctant to use bindings that go through their official C++ implementations after seeing how much trouble we had integrating TensorFlow Lite.

When I was playing around with hotg-ai/wasi-nn-experiment I came across a pure Rust implementation TensorFlow and ONNX inference called tract. This was able to cross-compile to aarch64-linux-android and wasm32-unknown-unknown without any extra work.

By using tract instead of the reference implementations we'll be giving up some performance, reliability, and features (e.g. missing model ops) in exchange for long term maintainability and reduced build complexity. @f0rodo may want to comment on this trade-off, but from an engineering perspective I think it's worth it.

The things we'll need to support new model types:

saidinesh5 commented 2 years ago

@Michael-F-Bryan long term maintainability will be more problematic though. tract does NOT implement all the operators that tf-lite / ONNX provides. Even ONNX support is not 100%, and this is a moving target. So whenever a user's model doesn't work, we get the bug reports (and maintainability burden) instead of upstream tensorflow/onnx. Tract's statement on tensorflow 2 support is basically:

Addiotionaly, the complexity of TensorFlow 2 make it very unlikely that a direct support will ever exist in tract. Many TensorFlow 2 nets can be converted to ONNX and loaded in tract.

So we'd be going backwards on the actual user facing features (tf 1.0, not complete onnx feature set etc..)that we support this way.

That being said, tract could make a good starting point for us to try out wasi-nn. Especially if we want to target microcontroller world (librunecoral is a no-go for that). Eventually I'd like even librunecoral to support wasi-nn, but let's see how much time / resources we can allocate for that. We still have to kill the old C++ based RuneVM.

Personally, as long as we get zero copy pipelines, and be able to use appropriate hardware acceleration for the use cases that we wish support (eg. if we want to use rune for some kind of video processing - we need tpu/gpu acceleration there, but for just text/audio based models, we can get away without using hardware acceleration), we can get away with any framework.