Closed JustNello closed 8 months ago
Hey Luca! The frames you receive from the AudioStream are raw signed PCM. Looking at the docs from Deepgram, they do support linear16. I've already used Deepgram before, I think you can just connect to their websocket and send the frames you receive from livekit directly. (Also don't forget to use the right sample rate)
Awesome, it works 😀
One last question to improve my understanding: rtc.RemoteTrackPublication.mime_type
yields audio/opus. When is the audio converted to signed PCM?
The mime_type represents the codec utilized during the transmission of media to the recipient. Upon receipt, libwebrtc will immediately decode this media.
Hello, thanks for this project :)
I'd like to transcribe an audio track with Deepgram, but I have some issues.
The application The client is made of LiveKit React Components (ie. LiveKitRoom and AudioConference), with REDundant encoding disabled when a client joins a room, as described in the docs.
The server uses this Python SDK and it has been implemented starting from the Whisper example. In my case, the "whisper_task" has been replaced by a "deepgram_task", in this gist.
Issue I think I'm not getting how the AudioFrame (from rtc.AudioFrame package) encodes data. I'm new to audio streaming at all and this may be the cause of the issue. I know that the audio format is OPUS, but:
In other words, what does
bytes(frame.data)
return? Is the OPUS packet? I'm not able to inspect the packet using a packet inspector.Thank you in advance for any help you may give, Luca