sshh12 / llm_convo

Use ChatGPT over Twilio to create an AI phone agent (works for incoming or outgoing calls).
MIT License
89 stars 19 forks source link

Is websocket working correctly? #4

Open Abe-Telo opened 10 months ago

Abe-Telo commented 10 months ago

i tried to add some debugs, but none are hitting. Is there a requirement that we are missing in the setup?

     from websocket import create_connection

        @self.sock.route("/audiostream", websocket=True)
        def on_media_stream(ws):
            try:
                logging.debug("WebSocket connection initiated.")

                # Log the initial details about the WebSocket.
                logging.debug("WebSocket object: %s", ws)

                # Instantiate a new TwilioCallSession with the WebSocket and other details.
                session = TwilioCallSession(
                    ws,
                    self.client,
                    remote_host=self.remote_host,
                    static_dir=self.static_dir,
                )

                logging.debug("WebSocket: %s, Client: %s, Remote Host: %s, Static Dir: %s",
                              ws, self.client, self.remote_host, self.static_dir)

                # If an on_session callback is defined, run it in a new thread.
                if self.on_session is not None:
                    logging.debug("Starting on_session callback in a new thread.")
                    thread = threading.Thread(target=self.on_session, args=(session,))
                    thread.start()

                # Start the session.
                session.start_session()

            except Exception as e:
                # Catch and log any exceptions that might occur during WebSocket handling.
                logging.error("Error during WebSocket handling: %s", str(e))

            finally:
                # Log when the function exits or the WebSocket connection is closed.
                logging.debug("WebSocket connection closed or function exited.")

I have also added


        def incoming_voice(): 
            print(XML_MEDIA_STREAM.format(host=self.remote_host))
            print(self.client,self.remote_host,self.static_dir)
            return XML_MEDIA_STREAM.format(host=self.remote_host)

And i see the XML is returning correctly.

<Response>
    <Start>
        <Stream name="Audio Stream" url="wss://https://SOME_LINK-93ff-fe7c-5cb7.ngrok-free.app/audiostream" />
    </Start>
    <Pause length="60"/>
</Response>

What can be the issue here?

sshh12 commented 10 months ago

Hey! Actually I'm not sure if that XML is correct, looks like it's wss https but I believe it should just be wss

sshh12 commented 10 months ago

Make sure the remote host doesn't start with a protocol like the readme example

Abe-Telo commented 10 months ago

Hey! Actually I'm not sure if that XML is correct, looks like it's wss https but I believe it should just be wss

Yeah, I realized that during the test, and removed the https, But it still did not get it to work, Also as for default. I replaced that link with my link.

I will show more logs soon.

saravanaraja25 commented 10 months ago

Hey! Actually I'm not sure if that XML is correct, looks like it's wss https but I believe it should just be wss

Yeah, I realized that during the test, and removed the https, But it still did not get it to work, Also as for default. I replaced that link with my link.

I will show more logs soon.

I'm also facing a similar issue like this

Have you got any fixes?

sshh12 commented 10 months ago

Hm interesting, at some point I was able run as is but wouldn't be crazy if something broke. Currently travelling but I'll try to debug/clean up repo in the next week.

sshh12 commented 10 months ago

Added some example scripts to the README that seem to work fine for me.

saravanaraja25 commented 10 months ago

127.0.0.1 - - [2023-09-18 17:14:16] "POST /incoming-voice HTTP/1.1" 200 289 0.000662 INFO:pyngrok.process.ngrok:t=2023-09-18T17:14:16+0530 lvl=info msg="join connections" obj=join id=e38116dff61e l=127.0.0.1:8080 r=54.242.188.146:50054 INFO:pyngrok.process.ngrok:t=2023-09-18T17:14:18+0530 lvl=info msg="join connections" obj=join id=88bb1700f2cd l=127.0.0.1:8080 r=34.203.254.230:49818 INFO:root:Call connected, {'accountSid': '', 'streamSid': '', 'called': '', 'tracks': ['inbound'], 'mediaFormat': {'encoding': 'audio/x-mulaw', 'sampleRate': 8000, 'channels': 1}} -> Hello! Welcome to the Machine Learning hotline, how can I help? ['Hello! Welcome to the Machine Learning hotline, how can I help?'] Exception in thread Thread-8 (run_chat): Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Users/saravanarajapp/development/ai/llm_convo/twilio_ngrok_ml_rhyme_hotline.py", line 39, in run_chat run_conversation(agent_a, agent_b) File "/Users/saravanarajapp/development/ai/llm_convo/llm_convo/conversation.py", line 10, in run_conversation text_b = agent_b.get_response(transcript) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/saravanarajapp/development/ai/llm_convo/llm_convo/agents.py", line 64, in get_response self._say(transcript[-1]) File "/Users/saravanarajapp/development/ai/llm_convo/llm_convo/agents.py", line 59, in _say duration = self.speaker.get_duration(tts_fn) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/saravanarajapp/development/ai/llm_convo/llm_convo/audio_output.py", line 46, in get_duration duration = float(output.split("=")[1].split("\r")[0]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: could not convert string to float: '4.944000\n[/FORMAT]\n' INFO:root:Call media stream ended.

Thanks for the example scripts. I've tried that keyboard chat with GPT that works fine but the other examples with twilio integration throw me the above errors when I connect the call and the API is triggered.

sshh12 commented 10 months ago

Hm whats your ffmpeg -version seems like its very close to working but its unable to do some audio conversions

saravanaraja25 commented 10 months ago

my ffmpeg version is 6.0

sshh12 commented 9 months ago

I just reinstalled ffmpeg+ffprobe (realized it's ffprobe thats behind the issue):

ffprobe version 6.0-essentials_build-www.gyan.dev Copyright (c) 2007-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100

python examples\twilio_ngrok_ml_rhyme_hotline.py --preload_whisper --start_ngrok

And it seems to work 🤔

sshh12 commented 9 months ago

Ah think I got it, was windows vs linux issue with \r: https://github.com/sshh12/llm_convo/commit/ba19dad208583fae181a41f9aedb76781222db78

Try updating the repo and trying it.

robertvy commented 5 months ago

I encoutered the same error as above with the get_duration function, despite the last code update. It seems that ffprobe is to blame. On my AWS EC2 instance it wasn't enough to install ffmpeg and ffprobe the traditional way, but I also had to symlink ffprobe into my virtual environment. Then it worked flawlessly.

ln -sfn /usr/local/bin/ffmpeg/ffmpeg-*-amd64-static/ffprobe /path/to/your/venv/bin/ffprobe