pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
2.86k stars 210 forks source link

FastAPIWebSocketTransport not working when used in non Twilio settings #296

Open charlesyu108 opened 1 month ago

charlesyu108 commented 1 month ago

I am trying to implement a FastAPI server that can accept both Twilio calls & websocket calls but found that the FastAPIWebSocket transport did not drop in nicely.

I was using the examples/websocket-server client as a means to check my work and was confused why the WebSocketServerTransport worked whereas the FastAPI one did not. I noticed playback from the voice assistant was working but no inputs were getting thru.

I did some digging at the code for the transport input and found that the following monkey patched fixed it for this particular usecase:

        transport = FastAPIWebsocketTransport(
            websocket=websocket_client,
            params=FastAPIWebsocketParams(
                audio_out_enabled=True,
                add_wav_header=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                vad_audio_passthrough=True,
                serializer=ProtobufFrameSerializer()
            )
        )
        async def _patched_receive_messages(self):
            async for message in self._websocket.iter_bytes():
                frame = self._params.serializer.deserialize(message)
                if not frame:
                    continue
                if isinstance(frame, AudioRawFrame):
                    await self.push_audio_frame(frame)
                else: 
                    await self._internal_push_frame(frame)
            await self._callbacks.on_client_disconnected(self._websocket)

        transport._input._receive_messages = partial(_patched_receive_messages, transport._input)

The two changes were:

  1. I had to use self._websocket.iter_bytes() instead of self._websocket.iter_text() to tease out the protobuf data
  2. I had to make use of the _internal_push_frame function that isnt used by the current implementation

This makes sense when working with the Twilio WS api but would be nice if it could be generalized to accommodate this workflow

aconchillo commented 1 month ago

Good find! Yes, ideally the FastAPIWebsocketTransport should work with any cases.

charlesyu108 commented 1 month ago

Thanks @aconchillo appreciate the quick response