Closed danobot closed 3 months ago
Transcribro already does this. What makes you think it doesn't?
It transcribes as soon as end of speech is detected: https://github.com/soupslurpr/Transcribro/blob/fbcb5fb3042d6b223fed90e1b462514bc7abc676/app/src/main/kotlin/dev/soupslurpr/transcribro/recognitionservice/MainRecognitionService.kt#L359
My solution can clip out empty audio chunks and reduce the size of the recording sent to the model. This may improve performance
The VAD is set to trigger "end" after 3 seconds of no speech. Silence shorter than that must be kept to keep proper punctuation such as commas instead of periods.
no worries
I will run with my own fork for now. I will rename the project, remove any references to Transcribro to avoid confusion and link back to this project with proper credits in the read me file. Let me know if there is anything else I should do
Alright. To properly attribute Transcribro's source code license you can add it to the Credits screen and keep the original license in the root of the project (the file can be renamed to something like LICENSE.Transcribro.txt).
This isn't legal advice and is only for educational purposes.
ok will do, thanks for developing this great project
Would you be interested in merging an improvement to the way audio is recorded, voice activity analysed and queued for transcription? Current state: recording started when VAD is detected and stopped when VAD ends new state: each audio chunk (~300ms) is analysed for voice activity, if it contains speech it is added to a recording. Once a specified number of silent chunks are detected, the recording is added to a audio processing queue. Separate thread processes queue items and performs transcription. This allows for shorter recordings as we can effectively filter out silent audio chunks and only send audio that contains actual speech to be transcribed. It also decouples recording from transcribing, increasing reliability.