I'm using livekit-rtc to set up a voice assistant with Google Gemini 1.5. However, when I got the TTS result, I found that no API available for broadcast audio file to peers directly. So I used libav to read from the audio stream and convert the frames into livekit.rtc.AudioFrame then use source.capture_frame to send to peers. Nevertheless the result is not even acceptable. It can finish broadcasting about 2000 frames in 1 sec. I don't know how does source.capture work cuz there's even no documents for this library. All I got is few examples in the repository. And I add a delay compensation to it. But its performance is poor.
Is there any way to broadcast remote audio streams? Or, based on this train of thought, how to implement this properly?
My implementation:
async def broadcastAudioLoop(self, source: livekit.rtc.AudioSource, frequency: int):
while True:
# print('broadcasting audio...')
if self.fetchBroadcastMission() is None:
await source.capture_frame(self.generateEmptyAudioFrame())
# print('done2')
else:
frame: Optional[av.AudioFrame] = None
start = time.time()
count = 0
for frame in self.currentBroadcastMission.decode(audio=0):
# print(frame.sample_rate, frame.rate, frame.samples, frame.time_base, frame.dts, frame.pts, frame.time, len(
# frame.layout.channels), [i for i in frame.side_data.keys()])
try:
livekitFrame = livekit.rtc.AudioFrame(
frame.to_ndarray().tobytes(),
frame.sample_rate,
num_channels=len(frame.layout.channels),
samples_per_channel=frame.sample_rate // 100).remix_and_resample(webFrontend.config.LIVEKIT_SAMPLE_RATE, 1)
except:
# if there's problem with the frame, skip it and continue to the next one.
print('Error processing frame, skipping it.')
continue
future_base = time.time()
future = future_base + 1/(frame.sample_rate // 1000)
if (future - start > 0) and (future - start < 1):
count += 1
print(f'processed {count} frames in {
future - start} seconds.')
else:
# delay compensation
if (frame.sample_rate // 1000) - count > 5:
print('Too agressive! using 5 frames to compensate')
future -= 1 / 1000
elif (frame.sample_rate // 1000) - count < -5:
print('Too negatively agressive! using 5 frames to compensate')
future += 1 / 1000
else:
print('Normal compensation')
future -= ((frame.sample_rate // 1000) - count) / 1000
while time.time() < future:
await source.capture_frame(livekitFrame)
Problem:
I'm using livekit-rtc to set up a voice assistant with Google Gemini 1.5. However, when I got the TTS result, I found that no API available for broadcast audio file to peers directly. So I used
libav
to read from the audio stream and convert the frames intolivekit.rtc.AudioFrame
then usesource.capture_frame
to send to peers. Nevertheless the result is not even acceptable. It can finish broadcasting about 2000 frames in 1 sec. I don't know how doessource.capture
work cuz there's even no documents for this library. All I got is few examples in the repository. And I add a delay compensation to it. But its performance is poor.Is there any way to broadcast remote audio streams? Or, based on this train of thought, how to implement this properly?
My implementation:
https://github.com/livekit/python-sdks/assets/47490867/a9698f05-59ae-4ef0-b424-dae7e9030f0e