secretsauceai / secret_sauce_ai

Secret Sauce AI: a coordinated community of tech minded AI enthusiasts
Apache License 2.0
67 stars 5 forks source link

Why rust wake word engine? #5

Open secretsauceai opened 2 years ago

secretsauceai commented 2 years ago

You have mentioned interest in creating a rust wake word engine that can run precise models. I am curious as to the advantages of this.

Blockers

@sheosi is working on this.

sheosi commented 2 years ago

Let's answer some questions 😄️.

Now, all of the answers depend on which "inference engine" (library for running the models) we use. There are two available: TFLite and Tract (by Sonos, their own benchmarks say that it is faster than TFLite for older/less powerful devices like Raspi 0). Running the model with one or another is just a matter of whether we can get Tract to load the model.

Why Tract can be faster on old hardware? Optimization, TFLite has only committed to optimize to newer processors (ARMv7) while Tract says they are optimized for ARM VFP (which means ARMv6).

Will this be easier to deploy onto Android devices?

AFAIK, yes. Rust compiles to Android quite nicely and needs almost no runtime dependencies. Which means not too much bloat on the resulting APK.

Would this have any advantage for other devices (ie raspi)?

It should. With Tract older and less powerful devices (like Raspi0) will be faster. And regardless of which library we use if we are using, a pure Rust wake-word Hermes component would mean a smaller RAM usage (one less Python runtime to have in memory).

Do you really think this will increase the speed of the engine noticeably (as the biggest bottleneck is running the model itself and that is in tf which is coded in C++)?

If we get to use Tract we might get pretty good improvements on low-end hardware, there might also be some really small improvements due to pretty much having no runtime.

What are the steps involved to get an engine such as Precise working on an android device?

Using Rust, you would need to setup cross-compilation and have an application that makes use of the library. Fortunately this has been done already. An example of Flutter + Rust on Android and iOS: https://github.com/shekohex/flutterust. The template itself is pretty complicated, mixing Flutter (for crossplatform GUI), Rust, Kotlin (for Android-specific code) and Swift (for iOS specific code). Running it, however, is just a matter of cargo make and flutter run. I have a private Lily app with this exact same setup.

What are the greatest difficulties in implementing an engine such as precise in rust and deploying it on an Android phone?

If we consider the app part "solved", the worst part is making sure it will compile properly on Android, as there's no proper way of making sure it will work ahead of time. Rust code should be no biggie, code interfacing with C++ libraries, on the other side, should work, but there's some problems that might arise (a *.so not being included in the apk, linking problems ...).

One extra problem that I'm seeing is that in the TFLite path, the conversion from TFLite tensors to Rust's ndarray (Rust's "numpy") has not been done.

secretsauceai commented 2 years ago

A very well written breakdown, thanks @sheosi!

To follow up, you wrote that there were some issues with Tract with ONNX in this repo, however you said during conversion of the tf2 model to ONNX there is a loop node which is not supported.

Could you expand on that with ONNX?

Also what is with tf or tflite support of Tract?

sheosi commented 2 years ago

Hey thanks, @secretsauceai 😄️

Is this a specific issue you have found via git issues for ONNX or stack over flow?

Not really. This is from my own experience. I got the ONNX model by transforming the tf model using python3 -m tf2onnx.convert --graphdef hey-mycroft.pb --output model.onnx --inputs 'net_input:0' --outputs 'net_output:0' using Python's package tf2onnx. When loading that into the tract version branch of precise-rs what you get is:

thread 'tests::test_positive' panicked at 'called `Result::unwrap()` on an `Err` value: ModelError(Failed analyse for node #37 "generic_loop_Loop__27" Unimplemented(Loop)

Which means that there's an operation not implemented: Loop. You can go to tract's webpage where they list all the supported operations and see that effectively that one isn't implemented.

Is this a total 'dead end' as you said before?

For me it is. But someone might be able to make it work easily. It's only a matter om implementing all the missing nodes (if there are any more than Loop), they say is an easy task, but since I have no idea of ML I can't do that.

Is there any blocker in using a tf2 or tflite model directly with Tract?

It can read tf1 models and supports a subset of the operations allowed by it. However, loading the tf1 model of precise is a no go either, since it doesn't support TensorArrayV3, here's the issue about that.

There's no support for reading tf2 models (though I guess you could implement that yourself, but I see no major advantage to it).

When it comes to tflite Tract supports the whole set of operations 🎉️, but it doesn't support reading the file itself. This can be made by ourselves, and is a route that can be taken in the future to take profit of Tract's good points (faster on low end, independance of tf, only Rust...).

secretsauceai commented 2 years ago

Great follow up. Maybe someone will be able to further the implementation with Tract in the future. But until then we will need to find another way. With that in mind, regarding using tf2 or tflite:

skewballfox commented 2 years ago

It's only a matter om implementing all the missing nodes (if there are any more than Loop), they say is an easy task, but since I have no idea of ML I can't do that.

apparently someone has attempted implementing Loop before , but it was complicated enough that they put the attempt on ice

looking at the available options, if tflite-rs turns out to be currently unuseable, I think the easiest of the available options is to create something to read the tflite models in rust.

Apparently they are implemented as flatbuffers. a couple of links I found which may be helpful to anyone who tries to tackle this:

honestly the documentation for the whole thing seems rather opaque.