livekit / python-sdks

LiveKit real-time and server SDKs for Python
https://docs.livekit.io
Apache License 2.0
80 stars 26 forks source link

Audio frames streaming and OPUS packets #96

Closed JustNello closed 8 months ago

JustNello commented 8 months ago

Hello, thanks for this project :)

I'd like to transcribe an audio track with Deepgram, but I have some issues.

The application The client is made of LiveKit React Components (ie. LiveKitRoom and AudioConference), with REDundant encoding disabled when a client joins a room, as described in the docs.

The server uses this Python SDK and it has been implemented starting from the Whisper example. In my case, the "whisper_task" has been replaced by a "deepgram_task", in this gist.

Issue I think I'm not getting how the AudioFrame (from rtc.AudioFrame package) encodes data. I'm new to audio streaming at all and this may be the cause of the issue. I know that the audio format is OPUS, but:

In other words, what does bytes(frame.data) return? Is the OPUS packet? I'm not able to inspect the packet using a packet inspector.

Thank you in advance for any help you may give, Luca

theomonnom commented 8 months ago

Hey Luca! The frames you receive from the AudioStream are raw signed PCM. Looking at the docs from Deepgram, they do support linear16. I've already used Deepgram before, I think you can just connect to their websocket and send the frames you receive from livekit directly. (Also don't forget to use the right sample rate)

JustNello commented 8 months ago

Awesome, it works 😀

One last question to improve my understanding: rtc.RemoteTrackPublication.mime_type yields audio/opus. When is the audio converted to signed PCM?

theomonnom commented 8 months ago

The mime_type represents the codec utilized during the transmission of media to the recipient. Upon receipt, libwebrtc will immediately decode this media.