Closed jandieg closed 11 months ago
Hi @jandieg! Thank you for bringing this to our attention. It seems that the audio file you're using has a sample rate of 48K. While we take a deeper look at why our system is not able to process this audio, I was able to confirm that if you downsample your audio file to 16K sample rate, it works as expected (I've attached the downsampled file for your reference) audio_new.zip
In order to unblock yourself for the time being, please use a sample rate of 16K.
So this is how Whatsapp collects audio. Aren't you going to look at this issue? WhatsApp is quite popular, and an important point of entry into conversational space. I am wondering why this being closed if so
Hey @jandieg! We are definitely looking into this issue on our end. Our system should be able to accept a variety of audio formats. We should have a fix for this soon.
@jandieg We found that the issue is not because of the sample rate but because the ogg file has opus codec. Our API currently assumes that ogg files use only Vorbis codec. For now in the short term, we added a workaround for this that if you provide the content-type as audio/opus, it should work. In the long term, we should be able to infer the metadata from the audio file itself.
Thank you. I started using Whisper for the time being. Any pros/cons you could highlights around wit vs whisper?
Question
What is the current behavior? POST to /speech with a short ogg file (captured from WhatsApp) saying "send justification" returns:
{ "entities": {}, "intents": [], "text": "", "traits": {} }
Same words "send justification" sent as /message do respond with the correct intent.What is the expected behavior? Produce the same intent as the text sent to /message.
If applicable, what is the App ID where you are experiencing this issue? If you do not provide this, we cannot help. 582793603989068
Attached audio sample. audio.zip