huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o
Apache License 2.0
3.59k stars 376 forks source link

Pipeline Connects But Doesn't Respond with Server/Client Approach #95

Open ngquangtrung57 opened 2 months ago

ngquangtrung57 commented 2 months ago

Hello, thanks for your work. I would like your help regarding an issue I'm encountering with my current setup. The speech-to-speech pipeline successfully establishes a connection between the client (local machine) and the server (SLURM node), but fails to process or respond to any input. Despite logs indicating successful connections on both the client and server sides, the system remains unresponsive.

Environment Setup: Local Machine: Operating System: Windows 11 Server: Remote SLURM Server

  1. Steps to Reproduce: On the SLURM node, start the server:
    python s2s_pipeline.py \
       --recv_host 0.0.0.0 \
       --send_host 0.0.0.0 \
       --recv_port 9002 \
       --send_port 9003 \
       --lm_model_name microsoft/Phi-3-mini-4k-instruct \
       --init_chat_role system \
       --stt_compile_mode reduce-overhead \
       --tts_compile_mode default \
       --log_level DEBUG \
       --mode socket

    On the local machine, set up SSH tunnel:

    ssh -L 9002:[SLURM_NODE_IP]:9002 -L 9003:[SLURM_NODE_IP]:9003 [USERNAME]@[GATEWAY_IP]

    On the local machine, run the client:

    python listen_and_play.py --host localhost --recv_port 9002 --send_port 9003

    Current Behavior: SSH tunnel successfully establishes connections for both ports:

    debug1: Connection to port 9003 forwarding to [SLURM_NODE_IP] port 9003 requested.
    debug1: channel 5: new [direct-tcpip]
    debug1: Connection to port 9002 forwarding to [SLURM_NODE_IP] port 9002 requested.
    debug1: channel 6: new [direct-tcpip]

    Server logs indicate successful connections:

    connections.socket_receiver - INFO - receiver connected
    connections.socket_sender - INFO - sender connected
andimarafioti commented 2 months ago

I assume that you have an issue with microphone input to Python. We honestly haven't tried this on Windows and I don't have access to a windows machine. I'm going to the hugging face office today and I'll ask around if someone has a windows machine lying around that I can borrow.

ngquangtrung57 commented 2 months ago

Sure, let me know if you need any more information to help me debugging. Also this is the output when I run

import sounddevice as sd
print(sd.query_devices())
   0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
>  1 Microphone Array (Realtek(R) Au, MME (2 in, 0 out)
   2 Microphone (Voice Changer Virtu, MME (2 in, 0 out)
   3 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
<  4 Speakers (Realtek(R) Audio), MME (0 in, 2 out)
   5 Speakers (Voice Changer Virtual, MME (0 in, 2 out)
   6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
   7 Microphone Array (Realtek(R) Audio), Windows DirectSound (2 in, 0 out)
   8 Microphone (Voice Changer Virtual Audio Device (WDM)), Windows DirectSound (2 in, 0 out)
   9 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
  10 Speakers (Realtek(R) Audio), Windows DirectSound (0 in, 2 out)
  11 Speakers (Voice Changer Virtual Audio Device (WDM)), Windows DirectSound (0 in, 2 out)
  12 Speakers (Realtek(R) Audio), Windows WASAPI (0 in, 2 out)
  13 Speakers (Voice Changer Virtual Audio Device (WDM)), Windows WASAPI (0 in, 2 out)
  14 Microphone Array (Realtek(R) Audio), Windows WASAPI (2 in, 0 out)
  15 Microphone (Voice Changer Virtual Audio Device (WDM)), Windows WASAPI (2 in, 0 out)
  16 Headphones (Realtek HD Audio 2nd output), Windows WDM-KS (0 in, 2 out)
  17 Speakers 1 (Realtek HD Audio output with SST), Windows WDM-KS (0 in, 2 out)
  18 Speakers 2 (Realtek HD Audio output with SST), Windows WDM-KS (0 in, 2 out)
  19 PC Speaker (Realtek HD Audio output with SST), Windows WDM-KS (2 in, 0 out)
  20 Microphone Array (Realtek HD Audio Mic Array input), Windows WDM-KS (2 in, 0 out)
  21 Microphone (Realtek HD Audio Mic input), Windows WDM-KS (2 in, 0 out)
  22 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
  23 Headset (@System32\drivers\bthhfenum.sys,#2;%1 Hands-Free%0
;(WI-XB400)), Windows WDM-KS (0 in, 1 out)
  24 Headset (@System32\drivers\bthhfenum.sys,#2;%1 Hands-Free%0
;(WI-XB400)), Windows WDM-KS (1 in, 0 out)
  25 Headphones (), Windows WDM-KS (0 in, 2 out)
  26 Microphone (MFDriver Virtual Audio), Windows WDM-KS (2 in, 0 out)
  27 Speakers (MFDriver Virtual Audio), Windows WDM-KS (0 in, 2 out)
  28 Headset (@System32\drivers\bthhfenum.sys,#2;%1 Hands-Free%0
;(Dime 3)), Windows WDM-KS (0 in, 1 out)
  29 Headset (@System32\drivers\bthhfenum.sys,#2;%1 Hands-Free%0
;(Dime 3)), Windows WDM-KS (1 in, 0 out)
  30 Headphones (), Windows WDM-KS (0 in, 2 out)