Closed expenses closed 1 year ago
Hello, thank you for the report!
First, I fixed the document because the document is not clear. (ae1dbd721c3c5a9473c32a7c0918d6a0517ef9d3)
Currently 30 seconds speech segments are needed to get Whiwper analysis. This means that 8 intervals of 3.75 seconds must be judged by VAD to have speech. I will improve the behaviors (#13).
Please use larger number for -n
is 3.75 second is too short to analyze VAD.
# VAD for every 7.5 seconds
whispering --language en --model tiny -n 40
If you still have questions, please feel free to reopen this!
I will also add VAD threshold option (86f38c6) in next release.
Hey, thanks for making this! I was looking around for something that did live STT and this seems to work well!
Reading through the code, I'm very confused by the
allow_padding
variable. I couldn't get the code to work at all without--allow-padding
. Maybe document what this code is doing?https://github.com/shirayu/whispering/blob/91231811e76a4c0580d469154397e892a9c6b0b7/whispering/transcriber.py#L264-L272
Additionally, and maybe this is because my mic isn't loud enough, the VAD didn't seem to work super well. I got it working for a bit at the start of recording when I had
--allow-padding
but then it seemed to report 'No speech' no matter how loudly I spoke. I'll have to try and adjust my mic volume to see if I can fix that.Logs
Here's a section of logging:
Environment