Predictions on single audio data points from memory using python api

ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

http://ludwig.ai

Apache License 2.0

11.19k stars 1.19k forks source link

Predictions on single audio data points from memory using python api #1163

Closed Peetee06 closed 1 month ago

Peetee06 commented 3 years ago

Hi,

I want to build an application that gets audio data from a microphone audio stream and makes live predictions of chunks of 1 sec audio with 2 categories using a pre-trained ludwig model. As far as I know it is only possible to predict audio data using saved files in the filesystem. In order to make live predictions it would be much faster if there was a way to feed single python objects or wav encoded bytes to the model and let it predict the corresponding class one audio chunk at a time. Is there a way to do that with ludwig with as little overhead as possible?

Best regards, Peter

w4nderlust commented 3 years ago

@ Peetee06 right now unfortunately it is not possible, but I completely agree with you that it's a very much needed feature. We have a similar issue for the image features, as they behave similarly (both read from files right now). We'll make this a priority after we are done with some internal refactoring we are doing for v0.4 .

In the meantime, as you have already figured out, the solution is to save, maybe to a temp dir, the content into an audio file, then Ludwig will load it back, with the clear overhead this entails.

Peetee06 commented 3 years ago

@w4nderlust looking forward to the feature. Will go with the workaround in the mean time as you suggested.

Thanks for the great work on Ludwig! :)