shirayu / whispering

Streaming transcriber with whisper
MIT License
685 stars 53 forks source link

Whispering not outputting any text #45

Closed joer33304 closed 2 years ago

joer33304 commented 2 years ago

Description

After starting, debug prints out actions but there is no text output. I set the --output test.txt and it creates and empty file/ I assume the default should print out to the console in real time (after the set delay).

What do I do wrong ?

Logs (Optional)

whispering --language en --model tiny --debug [2022-10-26 12:11:22,087] cli.get_wshiper:219 DEBUG -> WhisperConfig: model_name='tiny' device='cpu' language='en' fp16=True [2022-10-26 12:11:22,478] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead Using cache found in C:\Users\Joe/.cache\torch\hub\snakers4_silero-vad_master [2022-10-26 12:11:22,878] cli.get_context:232 DEBUG -> Context: protocol_version=6002 timestamp=0.0 buffer_tokens=[] buffer_mel=None nosoeech_skip_count=None temperatures=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0] patience=None compression_ratio_threshold=2.4 logprob_threshold=-1.0 no_captions_threshold=0.6 best_of=5 beam_size=5 no_speech_threshold=0.6 buffer_threshold=0.5 vad_threshold=0.5 max_nospeech_skip=16 data_type='float32' [2022-10-26 12:11:22,879] cli.transcribe_from_mic:56 INFO -> Ready to transcribe [2022-10-26 12:11:22,890] cli.transcribe_from_mic:67 DEBUG -> Audio #: 0, The rest of queue: 0 [2022-10-26 12:11:26,761] cli.transcribe_from_mic:82 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-26 12:11:26,762] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-26 12:11:26,933] vad.call:56 DEBUG -> VAD: 0.9772128462791443 (threshold=0.5) [2022-10-26 12:11:26,936] transcriber.transcribe:266 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-26 12:11:26,937] transcriber.transcribe:273 DEBUG -> mel.shape: torch.Size([80, 375]) [2022-10-26 12:11:26,938] transcriber.transcribe:277 DEBUG -> seek: 0 [2022-10-26 12:11:26,939] transcriber.transcribe:282 DEBUG -> mel.shape (375) - seek (0) < N_FRAMES (3000) [2022-10-26 12:11:26,940] transcriber.transcribe:288 DEBUG -> No padding [2022-10-26 12:11:26,940] transcriber.transcribe:345 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 375]) [2022-10-26 12:11:26,941] cli.transcribe_from_mic:67 DEBUG -> Audio #: 1, The rest of queue: 0 [2022-10-26 12:11:30,582] cli.transcribe_from_mic:82 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-26 12:11:30,584] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-26 12:11:30,696] vad.call:56 DEBUG -> VAD: 0.9962943196296692 (threshold=0.5) [2022-10-26 12:11:30,697] transcriber.transcribe:266 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-26 12:11:30,698] transcriber.transcribe:270 DEBUG -> buffer_mel.shape: torch.Size([80, 375]) [2022-10-26 12:11:30,698] transcriber.transcribe:273 DEBUG -> mel.shape: torch.Size([80, 750]) [2022-10-26 12:11:30,699] transcriber.transcribe:277 DEBUG -> seek: 0 [2022-10-26 12:11:30,700] transcriber.transcribe:282 DEBUG -> mel.shape (750) - seek (0) < N_FRAMES (3000) [2022-10-26 12:11:30,700] transcriber.transcribe:288 DEBUG -> No padding [2022-10-26 12:11:30,701] transcriber.transcribe:345 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 750]) [2022-10-26 12:11:30,702] cli.transcribe_from_mic:67 DEBUG -> Audio #: 2, The rest of queue: 0 [2022-10-26 12:11:34,200] cli.transcribe_from_mic:82 DEBUG -> Got. The rest of queue: 0

Environment

Additional context

If writing to a file, should it do auto flush ? What's the correct way to exit the program ?

shirayu commented 2 years ago

First, how long have you waited? By the default, it need to wait at least 30 seconds.

https://github.com/shirayu/whispering#parse-interval

By default, Whisper does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds. However, if silence segments appear 16 times (the default value of --max_nospeech_skip) after speech is detected, the analysis is performed.

joer33304 commented 2 years ago

Now I feel super stupid :) It helps to read ALL instructions :)

Thanks, it works.

On Thu, Oct 27, 2022 at 9:12 AM Yuta Hayashibe @.***> wrote:

First, how long have you waited? By the default, it need to wait at least 30 seconds.

https://github.com/shirayu/whispering#parse-interval

By default, Whisper does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds. However, if silence segments appear 16 times (the default value of --max_nospeech_skip) after speech is detected, the analysis is performed.

— Reply to this email directly, view it on GitHub https://github.com/shirayu/whispering/issues/45#issuecomment-1293507092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASPEPUR4Z7TL2WY76UCHTDWFJ5TRANCNFSM6AAAAAARPFVQEE . You are receiving this because you authored the thread.Message ID: @.***>