shirayu / whispering

Streaming transcriber with whisper
MIT License
685 stars 53 forks source link

No ASR results macOS #25

Closed fantinuoli closed 2 years ago

fantinuoli commented 2 years ago

Describe the bug

No ASR results are produced and error after a while

To Reproduce

whispering --language en --model tiny --debug

Logs

(whisper_streaming) fc@Claudios-MacBook-Pro whisper_streaming % whispering --language en --model tiny --debug [2022-10-09 19:13:13,532] cli.get_wshiper:211 DEBUG -> WhisperConfig: model_name='tiny' device='cpu' language='en' fp16=True [2022-10-09 19:13:14,103] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead Using cache found in /Users/fc/.cache/torch/hub/snakers4_silero-vad_master [2022-10-09 19:13:16,014] cli.get_context:223 DEBUG -> Context: timestamp=0.0 buffer_tokens=[] buffer_mel=None vad=True temperatures=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0] allow_padding=False patience=None compression_ratio_threshold=2.4 logprob_threshold=-1.0 no_captions_threshold=0.6 best_of=5 beam_size=5 no_speech_threshold=0.6 buffer_threshold=0.5 vad_threshold=0.5 [2022-10-09 19:13:16,014] cli.transcribe_from_mic:51 INFO -> Ready to transcribe [2022-10-09 19:13:16,058] cli.transcribe_from_mic:62 DEBUG -> Audio #: 0, The rest of queue: 0 [2022-10-09 19:13:19,915] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:19,916] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:20,148] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:20,148] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:20,148] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:20,148] transcriber.transcribe:265 DEBUG -> mel.shape (375) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:20,148] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:20,148] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:20,148] cli.transcribe_from_mic:62 DEBUG -> Audio #: 1, The rest of queue: 0 [2022-10-09 19:13:23,595] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:23,595] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:23,785] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:23,785] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:23,785] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 750]) [2022-10-09 19:13:23,785] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:23,785] transcriber.transcribe:265 DEBUG -> mel.shape (750) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:23,785] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:23,785] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 750]) [2022-10-09 19:13:23,785] cli.transcribe_from_mic:62 DEBUG -> Audio #: 2, The rest of queue: 0 [2022-10-09 19:13:27,425] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:27,425] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:27,474] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:27,475] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 750]) [2022-10-09 19:13:27,475] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1125]) [2022-10-09 19:13:27,475] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:27,475] transcriber.transcribe:265 DEBUG -> mel.shape (1125) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:27,475] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:27,475] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1125]) [2022-10-09 19:13:27,475] cli.transcribe_from_mic:62 DEBUG -> Audio #: 3, The rest of queue: 0 [2022-10-09 19:13:31,115] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:31,115] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:31,160] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:31,161] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1125]) [2022-10-09 19:13:31,161] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1500]) [2022-10-09 19:13:31,161] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:31,161] transcriber.transcribe:265 DEBUG -> mel.shape (1500) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:31,161] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:31,161] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1500]) [2022-10-09 19:13:31,161] cli.transcribe_from_mic:62 DEBUG -> Audio #: 4, The rest of queue: 0 [2022-10-09 19:13:34,998] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:34,998] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:35,046] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:35,046] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1500]) [2022-10-09 19:13:35,046] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1875]) [2022-10-09 19:13:35,046] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:35,046] transcriber.transcribe:265 DEBUG -> mel.shape (1875) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:35,046] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:35,047] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1875]) [2022-10-09 19:13:35,047] cli.transcribe_from_mic:62 DEBUG -> Audio #: 5, The rest of queue: 0 [2022-10-09 19:13:38,689] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:38,689] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:38,737] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:38,737] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1875]) [2022-10-09 19:13:38,737] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 2250]) [2022-10-09 19:13:38,737] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:38,737] transcriber.transcribe:265 DEBUG -> mel.shape (2250) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:38,737] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:38,737] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 2250]) [2022-10-09 19:13:38,737] cli.transcribe_from_mic:62 DEBUG -> Audio #: 6, The rest of queue: 0 [2022-10-09 19:13:42,368] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:42,369] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:42,415] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:42,415] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 2250]) [2022-10-09 19:13:42,416] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 2625]) [2022-10-09 19:13:42,416] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:42,416] transcriber.transcribe:265 DEBUG -> mel.shape (2625) - seek (0) < N_FRAMES (3000) [2022-10-09 19:13:42,416] transcriber.transcribe:271 DEBUG -> No padding [2022-10-09 19:13:42,416] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 2625]) [2022-10-09 19:13:42,416] cli.transcribe_from_mic:62 DEBUG -> Audio #: 7, The rest of queue: 0 [2022-10-09 19:13:46,251] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0 Analyzing[2022-10-09 19:13:46,251] transcriber.transcribe:235 DEBUG -> 60000 [2022-10-09 19:13:46,298] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375]) [2022-10-09 19:13:46,298] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 2625]) [2022-10-09 19:13:46,299] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 3000]) [2022-10-09 19:13:46,299] transcriber.transcribe:263 DEBUG -> seek: 0 [2022-10-09 19:13:46,299] transcriber.transcribe:280 DEBUG -> seek=0, timestamp=0.0, mel.shape: torch.Size([80, 3000]), segment.shape: torch.Size([80, 3000]) [2022-10-09 19:13:46,299] transcriber._decode_with_fallback:103 DEBUG -> DecodeOptions: DecodingOptions(task='transcribe', language='en', temperature=0.0, sample_len=None, best_of=None, beam_size=5, patience=None, length_penalty=None, prompt=[], prefix=None, suppress_blank=True, suppress_tokens='-1', without_timestamps=False, max_initial_timestamp=1.0, fp16=False) Traceback (most recent call last): File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/bin/whispering", line 8, in sys.exit(main()) File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/cli.py", line 301, in main for text in transcribe_from_mic( File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/cli.py", line 82, in transcribe_from_mic for chunk in wsp.transcribe(audio=audio, ctx=ctx): File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/transcriber.py", line 284, in transcribe result = self._decode_with_fallback( File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/transcriber.py", line 104, in _decode_with_fallback decode_result = self.model.decode( File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 700, in decode result = DecodingTask(model, options).run(mel) File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 472, in init self.decoder = BeamSearchDecoder( File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 283, in init self.max_candidates: int = round(beam_size (1.0 + patience)) TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

Environment

shirayu commented 2 years ago

Thank you for the report.

It seems "Whisper" is not expected version because the line in lib/python3.10/site-packages/whisper/decoding.py shoud be the following.

https://github.com/openai/whisper/blob/9e653bd0ea0f1e9493cb4939733e9de249493cfb/whisper/decoding.py#L282-L283

self.max_candidates: int = round(beam_size * self.patience)

However, yours is the following.

self.max_candidates: int = round(beam_size * (1.0 + patience))

How did you install the "whispering"? The expected install step is this.

fantinuoli commented 2 years ago

I reinstalled following the suggested procedure (same results). This is how installation is completed:

Successfully built whispering whisper future
Installing collected packages: tokenizers, websockets, urllib3, typing-extensions, tqdm, regex, pyyaml, pyparsing, pycparser, numpy, more-itertools, idna, future, filelock, charset-normalizer, certifi, torch, requests, pydantic, packaging, ffmpeg-python, CFFI, torchaudio, sounddevice, huggingface-hub, transformers, whisper, whispering
Successfully installed CFFI-1.15.1 certifi-2022.9.24 charset-normalizer-2.1.1 ffmpeg-python-0.2.0 filelock-3.8.0 future-0.18.2 huggingface-hub-0.10.1 idna-3.4 more-itertools-8.14.0 numpy-1.23.3 packaging-21.3 pycparser-2.21 pydantic-1.10.2 pyparsing-3.0.9 pyyaml-6.0 regex-2022.9.13 requests-2.28.1 sounddevice-0.4.5 tokenizers-0.13.1 torch-1.12.1 torchaudio-0.12.1 tqdm-4.64.1 transformers-4.23.1 typing-extensions-4.4.0 urllib3-1.26.12 websockets-10.3 whisper-1.0 whispering-0.5.1
shirayu commented 2 years ago

The similar report is #24.

It seems that you are using virtualenvs. Please retry by re-creating virtualenv (i.e. not use the current.local/share/virtualenvs/) or without virtualenv.

fantinuoli commented 2 years ago

Unfortunately, nor the re-creation of virtualenv nor a clean install without virtualenv changes anything.

Installation is performed (apparently) correctly. Model seems to load (some seconds). Mic seems to work in Terminal.

shirayu commented 2 years ago

Is the path of whispering is expected? If you can provide me with logs or other detailed information, I may be able to help.

fantinuoli commented 2 years ago

This is what I get (most requirements where already satisfied (this is without venv):

fc@Claudios-MacBook-Pro whispering % pip install -U git+https://github.com/shirayu/whispering.git@v0.6.2
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Collecting git+https://github.com/shirayu/whispering.git@v0.6.2
  Cloning https://github.com/shirayu/whispering.git (to revision v0.6.2) to /private/var/folders/_9/53wvt37d1jq7x6y7gwgvrstr0000gn/T/pip-req-build-wveowo4t
  Running command git clone --filter=blob:none --quiet https://github.com/shirayu/whispering.git /private/var/folders/_9/53wvt37d1jq7x6y7gwgvrstr0000gn/T/pip-req-build-wveowo4t
  Running command git checkout -q db83a89d7b78d85e373c6d7f41ed00db22390a48
  Resolved https://github.com/shirayu/whispering.git to commit db83a89d7b78d85e373c6d7f41ed00db22390a48
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0
  Using cached whisper-1.0-py3-none-any.whl
Requirement already satisfied: sounddevice<0.5.0,>=0.4.5 in /usr/local/lib/python3.9/site-packages (from whispering==0.6.2) (0.4.5)
Requirement already satisfied: torchaudio<0.13.0,>=0.12.1 in /usr/local/lib/python3.9/site-packages (from whispering==0.6.2) (0.12.1)
Requirement already satisfied: pydantic<2.0.0,>=1.10.2 in /usr/local/lib/python3.9/site-packages (from whispering==0.6.2) (1.10.2)
Requirement already satisfied: websockets<11.0,>=10.3 in /usr/local/lib/python3.9/site-packages (from whispering==0.6.2) (10.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.9/site-packages (from whispering==0.6.2) (4.64.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.9/site-packages (from pydantic<2.0.0,>=1.10.2->whispering==0.6.2) (4.4.0)
Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.9/site-packages (from sounddevice<0.5.0,>=0.4.5->whispering==0.6.2) (1.15.1)
Requirement already satisfied: torch==1.12.1 in /usr/local/lib/python3.9/site-packages (from torchaudio<0.13.0,>=0.12.1->whispering==0.6.2) (1.12.1)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.9/site-packages (from whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (8.14.0)
Requirement already satisfied: transformers>=4.19.0 in /usr/local/lib/python3.9/site-packages (from whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (4.23.1)
Requirement already satisfied: ffmpeg-python==0.2.0 in /usr/local/lib/python3.9/site-packages (from whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (0.2.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.9/site-packages (from whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (1.23.4)
Requirement already satisfied: future in /usr/local/lib/python3.9/site-packages (from ffmpeg-python==0.2.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (0.18.2)
Requirement already satisfied: pycparser in /usr/local/lib/python3.9/site-packages (from CFFI>=1.0->sounddevice<0.5.0,>=0.4.5->whispering==0.6.2) (2.21)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (21.3)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2022.9.13)
Requirement already satisfied: requests in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2.28.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (0.10.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.6.0)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (0.13.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (6.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.9/site-packages (from packaging>=20.0->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.0.9)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.4)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.9/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (1.26.12)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2021.10.8)
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621

Running it:

fc@Claudios-MacBook-Pro whispering % whispering --language en --model tiny
[2022-10-18 12:00:18,049] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead
Using cache found in /Users/fc/.cache/torch/hub/snakers4_silero-vad_master
[2022-10-18 12:00:22,510] cli.transcribe_from_mic:56 INFO -> Ready to transcribe
shirayu commented 2 years ago

Do you have the Traceback? Is it the same as the one in your first post?

fantinuoli commented 2 years ago

Sure. This is a brand new installation:

$ brew install python
$ python3.10 -m pip install --upgrade pip
$ pip3 install poetry

Then

fc@Claudios-MacBook-Pro ~ % pip3 install -U git+https://github.com/shirayu/whispering.git@v0.6.2
Collecting git+https://github.com/shirayu/whispering.git@v0.6.2
  Cloning https://github.com/shirayu/whispering.git (to revision v0.6.2) to /private/var/folders/_9/53wvt37d1jq7x6y7gwgvrstr0000gn/T/pip-req-build-6dao73tv
  Running command git clone --filter=blob:none --quiet https://github.com/shirayu/whispering.git /private/var/folders/_9/53wvt37d1jq7x6y7gwgvrstr0000gn/T/pip-req-build-6dao73tv
  Running command git checkout -q db83a89d7b78d85e373c6d7f41ed00db22390a48
  Resolved https://github.com/shirayu/whispering.git to commit db83a89d7b78d85e373c6d7f41ed00db22390a48
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0
  Using cached whisper-1.0-py3-none-any.whl
Collecting tqdm
  Using cached tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
Collecting sounddevice<0.5.0,>=0.4.5
  Using cached sounddevice-0.4.5-py3-none-macosx_10_6_x86_64.macosx_10_6_universal2.whl (108 kB)
Collecting websockets<11.0,>=10.3
  Using cached websockets-10.3-cp310-cp310-macosx_10_9_x86_64.whl (97 kB)
Collecting pydantic<2.0.0,>=1.10.2
  Using cached pydantic-1.10.2-cp310-cp310-macosx_10_9_x86_64.whl (3.1 MB)
Collecting torchaudio<0.13.0,>=0.12.1
  Using cached torchaudio-0.12.1-cp310-cp310-macosx_10_9_x86_64.whl (3.1 MB)
Collecting typing-extensions>=4.1.0
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.10/site-packages (from sounddevice<0.5.0,>=0.4.5->whispering==0.6.2) (1.15.1)
Collecting torch==1.12.1
  Using cached torch-1.12.1-cp310-none-macosx_10_9_x86_64.whl (133.8 MB)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/site-packages (from whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (8.14.0)
Collecting transformers>=4.19.0
  Using cached transformers-4.23.1-py3-none-any.whl (5.3 MB)
Collecting numpy
  Downloading numpy-1.23.4-cp310-cp310-macosx_10_9_x86_64.whl (18.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.1/18.1 MB 13.7 MB/s eta 0:00:00
Collecting ffmpeg-python==0.2.0
  Using cached ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Collecting future
  Using cached future-0.18.2.tar.gz (829 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/site-packages (from CFFI>=1.0->sounddevice<0.5.0,>=0.4.5->whispering==0.6.2) (2.21)
Collecting huggingface-hub<1.0,>=0.10.0
  Using cached huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0-cp310-cp310-macosx_10_9_x86_64.whl (197 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.8.0)
Collecting regex!=2019.12.17
  Using cached regex-2022.9.13-cp310-cp310-macosx_10_9_x86_64.whl (293 kB)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (21.3)
Requirement already satisfied: requests in /usr/local/lib/python3.10/site-packages (from transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2.28.1)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Using cached tokenizers-0.13.1-cp310-cp310-macosx_10_11_x86_64.whl (3.8 MB)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.10/site-packages (from packaging>=20.0->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.0.9)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (3.4)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.10/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2.1.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (2022.9.24)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests->transformers>=4.19.0->whisper@ git+https://github.com/openai/whisper.git@d18e9ea5dd2ca57c697e8e55f9e654f06ede25d0->whispering==0.6.2) (1.26.12)
Building wheels for collected packages: whispering, future
  Building wheel for whispering (pyproject.toml) ... done
  Created wheel for whispering: filename=whispering-0.6.2-py3-none-any.whl size=15365 sha256=9a951c453a143f2c4e3dc076f97e25bef80a4120d6c047670b051a2abd6e8e7d
  Stored in directory: /private/var/folders/_9/53wvt37d1jq7x6y7gwgvrstr0000gn/T/pip-ephem-wheel-cache-uk73_iyy/wheels/9c/2f/59/aebc0923ee68b8514881120d69857bf05ec7bcd24774360fe5
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=1133a6853c21db8a256dd827c2c179329e4f1f8a943a3685b754a11b5039cb5f
  Stored in directory: /Users/fc/Library/Caches/pip/wheels/dc/16/09/eb08b4e34e6b638f113d2018cf0b22de1d8dca22a3a71873f7
Successfully built whispering future
Installing collected packages: tokenizers, websockets, typing-extensions, tqdm, regex, pyyaml, numpy, future, torch, sounddevice, pydantic, huggingface-hub, ffmpeg-python, transformers, torchaudio, whisper, whispering
Successfully installed ffmpeg-python-0.2.0 future-0.18.2 huggingface-hub-0.10.1 numpy-1.23.4 pydantic-1.10.2 pyyaml-6.0 regex-2022.9.13 sounddevice-0.4.5 tokenizers-0.13.1 torch-1.12.1 torchaudio-0.12.1 tqdm-4.64.1 transformers-4.23.1 typing-extensions-4.4.0 websockets-10.3 whisper-1.0 whispering-0.6.2
fc@Claudios-MacBook-Pro ~ % whispering --language en --model tiny
[2022-10-18 12:25:30,075] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead
Using cache found in /Users/fc/.cache/torch/hub/snakers4_silero-vad_master
[2022-10-18 12:25:31,509] cli.transcribe_from_mic:56 INFO -> Ready to transcribe

Same behavior. Something flashes a couple of time, after 10 sec or so, prompt gets into Ready to transcribe.

shirayu commented 2 years ago

@fantinuoli Sorry, I mean the traceback of whispering. Do you still get TypeError: unsupported operand type(s) for +: 'float' and 'NoneType' ?

fantinuoli commented 2 years ago

Okay. No error anymore. And the model is responding. I had to wait 60+second to see a response (only a partial one). I did not wait so long before. Interesting enough, offline whisper does not take that long.

I am on a average Mac (Intel Core i5, 16GB). Is it possible to inference the model in real-time also on CPU, or only GPU is required for (near) real-time processing?

fc@Claudios-MacBook-Pro ~ % whispering --language en --model tiny        
[2022-10-18 12:48:11,887] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead
Using cache found in /Users/fc/.cache/torch/hub/snakers4_silero-vad_master
[2022-10-18 12:48:13,095] cli.transcribe_from_mic:56 INFO -> Ready to transcribe
67.50->74.50     intrapet to be here, this is quite e.
shirayu commented 2 years ago

I'm not sure that the machine spec is enough but whispering seems to work as expected. Please check CPU usage.

Currently Whispering performs VAD and takes some time to start transcription. https://github.com/shirayu/whispering#parse-interval

If a speech segment is determined as "silence", it will not be transcribed. Try running with --vad 0 to disable VAD.