livekit / python-sdks

LiveKit real-time and server SDKs for Python
https://docs.livekit.io
Apache License 2.0
72 stars 23 forks source link

Is there any way to broadcast audio streams to peers with livekit-rtc? #206

Closed xiaokang00010 closed 3 weeks ago

xiaokang00010 commented 3 weeks ago

Problem:

I'm using livekit-rtc to set up a voice assistant with Google Gemini 1.5. However, when I got the TTS result, I found that no API available for broadcast audio file to peers directly. So I used libav to read from the audio stream and convert the frames into livekit.rtc.AudioFrame then use source.capture_frame to send to peers. Nevertheless the result is not even acceptable. It can finish broadcasting about 2000 frames in 1 sec. I don't know how does source.capture work cuz there's even no documents for this library. All I got is few examples in the repository. And I add a delay compensation to it. But its performance is poor.

Is there any way to broadcast remote audio streams? Or, based on this train of thought, how to implement this properly?

My implementation:

async def broadcastAudioLoop(self, source: livekit.rtc.AudioSource, frequency: int):
        while True:
            # print('broadcasting audio...')
            if self.fetchBroadcastMission() is None:
                await source.capture_frame(self.generateEmptyAudioFrame())
                # print('done2')
            else:
                frame: Optional[av.AudioFrame] = None
                start = time.time()
                count = 0
                for frame in self.currentBroadcastMission.decode(audio=0):
                    # print(frame.sample_rate, frame.rate, frame.samples, frame.time_base, frame.dts, frame.pts, frame.time, len(
                    #     frame.layout.channels), [i for i in frame.side_data.keys()])
                    try:
                        livekitFrame = livekit.rtc.AudioFrame(
                            frame.to_ndarray().tobytes(),
                            frame.sample_rate,
                            num_channels=len(frame.layout.channels),
                            samples_per_channel=frame.sample_rate // 100).remix_and_resample(webFrontend.config.LIVEKIT_SAMPLE_RATE, 1)
                    except:
                        # if there's problem with the frame, skip it and continue to the next one.
                        print('Error processing frame, skipping it.')
                        continue
                    future_base = time.time()
                    future = future_base + 1/(frame.sample_rate // 1000)
                    if (future - start > 0) and (future - start < 1):
                        count += 1
                        print(f'processed {count} frames in {
                              future - start} seconds.')
                    else:
                        # delay compensation
                        if (frame.sample_rate // 1000) - count > 5:
                            print('Too agressive! using 5 frames to compensate')
                            future -= 1 / 1000
                        elif (frame.sample_rate // 1000) - count < -5:
                            print('Too negatively agressive! using 5 frames to compensate')
                            future += 1 / 1000
                        else:
                            print('Normal compensation')
                            future -= ((frame.sample_rate // 1000) - count) / 1000
                    while time.time() < future:
                        await source.capture_frame(livekitFrame)

https://github.com/livekit/python-sdks/assets/47490867/a9698f05-59ae-4ef0-b424-dae7e9030f0e