huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o
Apache License 2.0
3k stars 318 forks source link

Assertion error when speaking #56

Closed RodriMora closed 2 weeks ago

RodriMora commented 2 weeks ago

Lastest commit as of now on both client and server https://github.com/huggingface/speech-to-speech/commit/fc9f960285b4c7591c36a39644f3e307c93030b0

Server specs: Nvidia 3090

Installed with:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Client specs: Macbookpro m3 pro 16Gb

Installed with: pip install -r requirements_mac.txt

Running server with:

python s2s_pipeline.py \ 
                                           --recv_host 0.0.0.0 \
                                           --send_host 0.0.0.0 \
                                           --lm_model_name microsoft/Phi-3-mini-4k-instruct \
                                           --init_chat_role system \
                                           --stt_compile_mode reduce-overhead \
                                           --tts_compile_mode default

After the server finises loading, connecting on the client with: python listen_and_play.py --host 192.168.1.x

When I say "hello" in the mic in the Mac I get this error on the server:

2024-08-28 09:43:26,365 - connections.socket_sender - INFO - sender connected
Exception in thread Thread-8 (run):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntuai/speech-to-speech/baseHandler.py", line 37, in run
    for output in self.process(input):
  File "/home/ubuntuai/speech-to-speech/STT/whisper_stt_handler.py", line 101, in process
    pred_ids = self.model.generate(input_features, **self.gen_kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 658, in generate
    ) = self.generate_with_fallback(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 801, in generate_with_fallback
    seek_outputs = super().generate(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1989, in generate
    result = self._sample(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 1660, in forward
    @add_start_docstrings_to_model_forward(WHISPER_INPUTS_DOCSTRING)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 987, in forward
    return compiled_fn(full_args)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 217, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 120, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 451, in wrapper
    return compiled_fn(runtime_args)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 1131, in __call__
    return self.current_callable(inputs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 993, in run
    return compiled_fn(new_inputs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 373, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 395, in cudagraphify
    manager = get_container(device_index).get_tree_manager()
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 329, in get_container
    container_dict = get_obj(local, "tree_manager_containers")
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 324, in get_obj
    assert torch._C._is_key_in_tls(attr_name)
AssertionError

Full error log: errorLog.txt

I'm connecting via VPN and I have a 10ms but stable latency between the client-server if that matters. But I've run it in previous versions just fine.

andimarafioti commented 2 weeks ago

Hi Rodri, let me check this quickly in my server to see that it still works for me at least

andimarafioti commented 2 weeks ago

OK so I tested this in two modes and it worked every time. Both were without the STT_compile flag

  1. Normal install without flash-attn on server and normal install on mac ✅
  2. Install with flash-attn on server and normal install on mac ✅

I couldn't try it with the STT_compile flag because I can't install python dev tools on my server (no root access). It seems like your issue is with the compilation, though. Could you try removing the flag and seeing if that fixes it? I'll try to get a way to test the compilation

andimarafioti commented 2 weeks ago

Ok, I got the compilation to work, and I have the same error as you. I'll try checking what could be the issue, thank you for raising it!

andimarafioti commented 2 weeks ago

I went back to a previous commit (before MPS merge) and this was working. Which is weird because googling the error, it seems to be because we want to compile and run on a thread.

andimarafioti commented 2 weeks ago

Found the issue and opened a PR to fix it. Thank you for raising this @RodriMora !

RodriMora commented 2 weeks ago

I can confirm it worked without the --stt_compile_mode reduce-overhead and now with the merged PR it works perfectly in my system. Thanks!