Improved voice activity detection and continuous recording

soupslurpr / Transcribro

Private and on-device speech recognition keyboard and service for Android.

ISC License

370 stars 7 forks source link

Improved voice activity detection and continuous recording #47

Closed danobot closed 3 months ago

danobot commented 3 months ago

Would you be interested in merging an improvement to the way audio is recorded, voice activity analysed and queued for transcription? Current state: recording started when VAD is detected and stopped when VAD ends new state: each audio chunk (~300ms) is analysed for voice activity, if it contains speech it is added to a recording. Once a specified number of silent chunks are detected, the recording is added to a audio processing queue. Separate thread processes queue items and performs transcription. This allows for shorter recordings as we can effectively filter out silent audio chunks and only send audio that contains actual speech to be transcribed. It also decouples recording from transcribing, increasing reliability.

soupslurpr commented 3 months ago

Transcribro already does this. What makes you think it doesn't?

danobot commented 3 months ago

It transcribes as soon as end of speech is detected: https://github.com/soupslurpr/Transcribro/blob/fbcb5fb3042d6b223fed90e1b462514bc7abc676/app/src/main/kotlin/dev/soupslurpr/transcribro/recognitionservice/MainRecognitionService.kt#L359

My solution can clip out empty audio chunks and reduce the size of the recording sent to the model. This may improve performance

soupslurpr commented 3 months ago

The VAD is set to trigger "end" after 3 seconds of no speech. Silence shorter than that must be kept to keep proper punctuation such as commas instead of periods.

danobot commented 3 months ago

no worries

danobot commented 3 months ago

I will run with my own fork for now. I will rename the project, remove any references to Transcribro to avoid confusion and link back to this project with proper credits in the read me file. Let me know if there is anything else I should do

soupslurpr commented 3 months ago

Alright. To properly attribute Transcribro's source code license you can add it to the Credits screen and keep the original license in the root of the project (the file can be renamed to something like LICENSE.Transcribro.txt).

This isn't legal advice and is only for educational purposes.

danobot commented 3 months ago

ok will do, thanks for developing this great project