huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o
Apache License 2.0
3.54k stars 369 forks source link

Error in STT handler when passing `device` arguments including `cpu` or `mps` #18

Closed RonanKMcGovern closed 3 months ago

RonanKMcGovern commented 3 months ago

Replication

Device = Mac M1 8 GB

git clone https://github.com/eustlb/speech-to-speech.git
cd speech-to-speech
pip install -r requirements.txt
pip install git+https://github.com/nltk/nltk.git@3.8.2
python s2s_pipeline.py --recv_host localhost --send_host localhost --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct --lm_device mps --stt_device mps --tts_device mps

Error:

Using cache found in /Users/ronanmcgovern/.cache/torch/hub/snakers4_silero-vad_master
2024-08-19 12:57:27,092 - __main__ - INFO - Warming up WhisperSTTHandler
Traceback (most recent call last):
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 1017, in <module>
    main()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 972, in main
    stt = WhisperSTTHandler(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 103, in __init__
    self.setup(*setup_args, **setup_kwargs)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 439, in setup
    self.warmup()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 470, in warmup
    start_event = torch.cuda.Event(enable_timing=True)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/cuda/streams.py", line 165, in __new__
    return super().__new__(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/_utils.py", line 912, in err_fn
    raise RuntimeError(f"Tried to instantiate dummy base class {class_name}")
RuntimeError: Tried to instantiate dummy base class Event

Then trying with:

python s2s_pipeline.py --recv_host localhost --send_host localhost --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct --lm_device mps --tts_device mps

Error:

2024-08-19 12:59:20,682 - __main__ - INFO - Warming up WhisperSTTHandler
Traceback (most recent call last):
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 1017, in <module>
    main()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 972, in main
    stt = WhisperSTTHandler(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 103, in __init__
    self.setup(*setup_args, **setup_kwargs)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 439, in setup
    self.warmup()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 470, in warmup
    start_event = torch.cuda.Event(enable_timing=True)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/cuda/streams.py", line 165, in __new__
    return super().__new__(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/_utils.py", line 912, in err_fn
    raise RuntimeError(f"Tried to instantiate dummy base class {class_name}")
RuntimeError: Tried to instantiate dummy base class Event
(venv) (base) ronanmcgovern@Ronans-MacBook-Pro speech-to-speech % python s2s_pipeline.py --recv_host localhost --send_host localhost --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct --lm_device mps --tts_device mps
Flash attention 2 is not installed
Using cache found in /Users/ronanmcgovern/.cache/torch/hub/snakers4_silero-vad_master
Traceback (most recent call last):
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 1017, in <module>
    main()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 972, in main
    stt = WhisperSTTHandler(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 103, in __init__
    self.setup(*setup_args, **setup_kwargs)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 433, in setup
    ).to(device)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2861, in to
    return super().to(*args, **kwargs)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1174, in to
    return self._apply(convert)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 805, in _apply
    param_applied = fn(param)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in convert
    return t.to(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 305, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Then trying with:

python s2s_pipeline.py --recv_host localhost --send_host localhost --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct --lm_device mps --stt_device cpu --tts_device mps

Error:

2024-08-19 13:01:55,724 - __main__ - INFO - Warming up WhisperSTTHandler
Traceback (most recent call last):
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 1017, in <module>
    main()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 972, in main
    stt = WhisperSTTHandler(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 103, in __init__
    self.setup(*setup_args, **setup_kwargs)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 439, in setup
    self.warmup()
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 470, in warmup
    start_event = torch.cuda.Event(enable_timing=True)
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/cuda/streams.py", line 165, in __new__
    return super().__new__(
  File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/_utils.py", line 912, in err_fn
    raise RuntimeError(f"Tried to instantiate dummy base class {class_name}")
RuntimeError: Tried to instantiate dummy base class Event

Then trying with:

python s2s_pipeline.py --recv_host localhost --send_host localhost --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct

Traceback (most recent call last): File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 1017, in main() File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 972, in main stt = WhisperSTTHandler( File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 103, in init self.setup(*setup_args, *setup_kwargs) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/speech-to-speech/speech-to-speech/s2s_pipeline.py", line 433, in setup ).to(device) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2861, in to return super().to(args, **kwargs) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1174, in to return self._apply(convert) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply module._apply(fn) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply module._apply(fn) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply module._apply(fn) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 805, in _apply param_applied = fn(param) File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in convert return t.to( File "/Users/ronanmcgovern/TR/ADVANCED-transcription/trelis-voice/hf_sts/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 305, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

andimarafioti commented 3 months ago

Hi! I'm aware of this and I'm working on adding mps support right now! I'm almost done, will tag you when I'm done!

andimarafioti commented 3 months ago

This one should solve your use @RonanKMcGovern https://github.com/huggingface/speech-to-speech/pull/20

RonanKMcGovern commented 3 months ago

will try shortly, thanks

RonanKMcGovern commented 3 months ago

Just ran again and having the same issues with the commands.

Closing this and will comment on the PR. Thanks for the help.