Audio event classification sample

khanhlvg commented 4 years ago

Goal: Build a sample app for Yamnet, a pretrained model that can recognize many different kinds of audio event.
References:
- Pretrained model
- TFLite sound classification Android sample
- iOS sample coming soon

sayakpaul commented 4 years ago

@khanhlvg does this indicate we can now convert an audio classification model trained using Teachable Machine?

farmaker47 commented 4 years ago

@khanhlvg I am definetely interested in that! Started reading all information...

farmaker47 commented 4 years ago

I made a colab notebook with executable code blocks and the ability to hear and try different audio files https://colab.research.google.com/drive/1tM9LIcFr5PCggzUqM0BXr07Zokil-Bye?usp=sharing

You can try your own audio files as there is a code block that transform them to the desired format! Try some from here https://freesound.org/home/

sayakpaul commented 4 years ago

@farmaker47 do you want to create a separate repository for the project? Also, feel free to send a PR by including the link in the README of this repository. We can then scope this project :D

farmaker47 commented 4 years ago

Yes sure!

I will create one when I start working the android part!

farmaker47 commented 4 years ago

Hey @khanhlvg and @sayakpaul I paste here some significant results from the benchmark tool for the yamnet tflite model.

yamnet_pixel_cpu_plus_hexagon

farmaker47 commented 4 years ago

I want your opinion @khanhlvg @sayakpaul about the fact that I get tflite error for this model when I try to use GPU. There are no results in every phone configuration. The TF Hub url is this where there is a code snippet on how to use this model. Do you think that this model is not for GPU and why is that?

sayakpaul commented 4 years ago

@farmaker47 it can be that the model is not yet supported by the TFLite GPU delegate or at least some parts of the model are not yet supported. For example, if I recollect correctly argmax() operation is not yet supported.

So, a good homework for you would be to load up the TFLite model in Netron and compare which ops are not supported by the delegate.

khanhlvg commented 4 years ago

@farmaker47 @sayakpaul It's great to see the progress on this and sorry for the delayed response from my side.

I think there's a good chance that the model isn't compatible with GPU delegate. To find out, you can run the benchmark tool with the yamnet model and see the output/error message from the tool to find out.

farmaker47 commented 3 years ago

Updated colab notebook with inference with interpreter with the tflite model.

Procedure of collecting sound, transforming it to correct form and infering with the interpreter seems similar with the one that we did with the SPICE model. So now what is left is to combine everything inside the android application!

Inference is extremely fast with CPU and I think we will have a great result with the mobile usage.

farmaker47 commented 3 years ago

I upload the first outputs from our model when the same .wav file is infered:

Colab notebook TensorFlow model: 9.9126965e-01 2.6559830e-04 1.5967786e-03 ..........
Colab notebook TF Lite model: 9.91269529e-01 2.65538692e-04 1.59677863e-03 ..........
Mobile phone TFLite file: 0.9912695, 2.655983E-4, 0.0015967786 .......... Clearly we have an exact match and everything seems to work fine!!!

Also I have to mention that Log messages inside android studio from the tensorflow library are now more helpful. Watch an example if I intentionally change the output size of the float array: E/EXCEPTION: java.lang.IllegalArgumentException: Cannot copy from a TensorFlowLite tensor (Identity) with shape [8, 521] to a Java object with shape [4, 521]. Above the log message points the error and the output node name (Identity)

The inference time for a 2 seconds audio file (after benchmarking it, CPU is used with only 2 threads) is 100ms... pretty fast don't you think??!! :) :)

khanhlvg commented 3 years ago

Thanks George for the update! It looks like you're making good progress :)

The inference time for a 2 seconds audio file (after benchmarking it, CPU is used with only 2 threads) is 100ms... pretty fast don't you think??!! :) :)

Indeed! Classification models aren't computational expensive.

farmaker47 commented 3 years ago

Clearly android app outputs the same labels as colab. I think project is going to finish really fast! Capture

sayakpaul commented 3 years ago

@farmaker47 from our previous discussions, I think that the only thing that is remaining here is the blog post. Right?

farmaker47 commented 3 years ago

Yes Sayak. I have started writing the blog post.

sayakpaul commented 3 years ago

@farmaker47 can we close this issue now as you have already released the blog posts?

farmaker47 commented 3 years ago

Yeah sure!

Στις Σάβ, 12 Δεκ 2020, 4:10 μ.μ. ο χρήστης Sayak Paul < notifications@github.com> έγραψε:

@farmaker47 https://github.com/farmaker47 can we close this issue now as you have already released the blog posts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ml-gde/e2e-tflite-tutorials/issues/32#issuecomment-743761783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHAJERDXHQGHXHRK57HPHLSUN2VPANCNFSM4SROA4TQ .

ml-gde / e2e-tflite-tutorials

Audio event classification sample #32