jitsi / jigasi

Jigasi: a server-side application acting as a gateway to Jitsi Meet conferences. Currently allows regular SIP clients to join meetings and provides transcription capabilities.
Apache License 2.0
532 stars 297 forks source link

Feature Request: Ability to configure BUFFER_SIZE for transcription #525

Open miro-ku opened 8 months ago

miro-ku commented 8 months ago

Hey everyone! Firstly, thanks for your work and a great product!

I have a feature request about ability to configure BUFFER_SIZE in transcriber service which is currenlty only 500ms. The use case is following: I don't need live captions in meetings, but I do need transcriptions. What I'm trying to do is to use Jitsi Skynet for transcription with Faster-Whisper. And since I don't need live captions but only resulting transcription - it looks like transcribing input stream using 500ms chunks isn't optimal. I assume that increase of buffer size can result into less workload on whisper service which is very desirable. Correct me, if I'm wrong

Thanks in advance

rpurdel commented 8 months ago

Hi, I will work on this when I will have some time on my hands. And btw, for the skynet transcriber the buffer size is ~1.2 seconds as the calculations in the participant class assume that the audio uses a 48k sampling rate everywhere, but skynet requires 16k. See https://github.com/jitsi/jigasi/blob/71b8a9198606835517371a4ed540d03fa93af662/src/main/java/org/jitsi/jigasi/transcription/WhisperAudioSilenceCaptureDevice.java#L62 and https://github.com/jitsi/jigasi/blob/71b8a9198606835517371a4ed540d03fa93af662/src/main/java/org/jitsi/jigasi/transcription/Participant.java#L49

miro-ku commented 8 months ago

Hi, @rpurdel thanks!

Yeah, you're right, buffer is bigger for whisper, but still too small and processed too frequent. I've confirmed much less workload on Skynet by modifying demo to use 5 seconds buffer