Set proper value to ``-n``

shirayu commented 2 years ago

Too small -n makes no response, while too large value consumes memory. Set proper value to -n and wake warning for too small value.

fantinuoli commented 2 years ago

I tried with several -nvalues. In all cases but one, nothing is output to console. In only one try with-n =1 0 I could had something transcribed. First result is in a wrong language; second time it was the right transcription. I was not able to replicate it, though, i.e. I normally do not get any transcription.

[2022-09-23 20:29:26,345] transcriber._deal_timestamp DEBUG -> Length of consecutive: 0
0.00->2.00  無限に
[2022-09-23 20:29:26,347] transcriber._deal_timestamp DEBUG -> Length of buffer: 0
[2022-09-23 20:29:26,347] transcriber.transcribe DEBUG -> Last rest_start=None
[2022-09-23 20:29:26,349] cli.transcribe_from_mic DEBUG -> Segment: 1
[2022-09-23 20:29:26,353] transcriber.transcribe DEBUG -> seek=0, timestamp=2.0, rest_start=None
[2022-09-23 20:29:32,840] transcriber.transcribe DEBUG -> Result: temperature=0.00, no_speech_prob=0.24, avg_logprob=-0.80
[2022-09-23 20:29:32,840] transcriber._deal_timestamp DEBUG -> Length of consecutive: 0
2.00->4.00   It is okay.
[2022-09-23 20:29:32,840] transcriber._deal_timestamp DEBUG -> Length of buffer: 0
[2022-09-23 20:29:32,840] transcriber.transcribe DEBUG -> Last rest_start=None
[2022-09-23 20:29:32,843] cli.transcribe_from_mic DEBUG -> Segment: 2
[2022-09-23 20:29:32,846] transcriber.transcribe DEBUG -> seek=0, timestamp=4.0, rest_start=None

shirayu commented 2 years ago

@fantinuoli Did you set proper value to --language? if the language is English, you need to set --language en like this.

poetry run whisper_streaming --language en --model base -n 20

I added the instruction about that in README. (e9e286d)

shirayu commented 2 years ago

I also found a bug about --lanauge! I fixed it at 9cd80ab.

shirayu commented 2 years ago

pad_or_trim return torch.Size([1, 80, 3000]). While speaking, padding is not expected.

When -n 160, torch.Size([1, 80, 3000]). So, 160 or larger is expected.

shirayu / whispering

Set proper value to ``-n`` #3