Assertion error when speaking

RodriMora commented 2 weeks ago

Lastest commit as of now on both client and server https://github.com/huggingface/speech-to-speech/commit/fc9f960285b4c7591c36a39644f3e307c93030b0

Server specs: Nvidia 3090

Installed with:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Client specs: Macbookpro m3 pro 16Gb

Installed with: pip install -r requirements_mac.txt

Running server with:

python s2s_pipeline.py \ 
                                           --recv_host 0.0.0.0 \
                                           --send_host 0.0.0.0 \
                                           --lm_model_name microsoft/Phi-3-mini-4k-instruct \
                                           --init_chat_role system \
                                           --stt_compile_mode reduce-overhead \
                                           --tts_compile_mode default

After the server finises loading, connecting on the client with: python listen_and_play.py --host 192.168.1.x

When I say "hello" in the mic in the Mac I get this error on the server:

2024-08-28 09:43:26,365 - connections.socket_sender - INFO - sender connected
Exception in thread Thread-8 (run):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntuai/speech-to-speech/baseHandler.py", line 37, in run
    for output in self.process(input):
  File "/home/ubuntuai/speech-to-speech/STT/whisper_stt_handler.py", line 101, in process
    pred_ids = self.model.generate(input_features, **self.gen_kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 658, in generate
    ) = self.generate_with_fallback(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py", line 801, in generate_with_fallback
    seek_outputs = super().generate(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1989, in generate
    result = self._sample(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/transformers/models/whisper/modeling_whisper.py", line 1660, in forward
    @add_start_docstrings_to_model_forward(WHISPER_INPUTS_DOCSTRING)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
    return fn(*args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 987, in forward
    return compiled_fn(full_args)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 217, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 120, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 451, in wrapper
    return compiled_fn(runtime_args)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 1131, in __call__
    return self.current_callable(inputs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 993, in run
    return compiled_fn(new_inputs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 373, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 395, in cudagraphify
    manager = get_container(device_index).get_tree_manager()
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 329, in get_container
    container_dict = get_obj(local, "tree_manager_containers")
  File "/home/ubuntuai/speech-to-speech/.venv/lib/python3.10/site-packages/torch/_inductor/cudagraph_trees.py", line 324, in get_obj
    assert torch._C._is_key_in_tls(attr_name)
AssertionError

Full error log: errorLog.txt

I'm connecting via VPN and I have a 10ms but stable latency between the client-server if that matters. But I've run it in previous versions just fine.