pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
3.28k stars 305 forks source link

Interruptions not working with websocket-server #456

Open Cuanmaomao opened 1 month ago

Cuanmaomao commented 1 month ago

I ran the sample code of examples/websocket-server but noticed that the user cannot interrupt.

dk-crazydiv commented 1 month ago

Same. The implementation isn't working with intterupts properly. Initially, "User started speaking" and "User stopped speaking" messages were not coming in logs. Fix is to add:

from pipecat.pipeline.task import PipelineParams
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

After this the backend logs look more or less the same as when using Daily backend logs but on frontend audio is overlapped. IF asking 3 questions back to back, 3 responses get played overlapping each other (and sentences of those responses also play simultaneously sometimes). After some point, the frontend started playing static audio noise (like a untuned radio), while backend logs seemed fine.

It seems that after minor frontend fixes, interruptions are working, but the data that already has streamed to the client plays completely then the other answer starts. Current issue:

Tell me 10 points about Tiger. -> 5 points generated on backend -> point 2 playing on frontend client <user interrupts: Tell me 10 points about Lion> -> 3 points of tiger still play (THIS is replaced by a pause followed by lion audio frames when using Daily service) -> lion audio frames start.

The websocket frame only sends AudioRawFrame and my guess is it would need to send TextFrame with some values on user interrupt based on which frontend code will clear this cache. Can anyone please guide or refer to any other place where this has already been done?

jonnyjohnson1 commented 1 week ago

Having this same issue.

Is it possible to interrupt with Websockets vs. WebRTC? Is the method for how the interruption takes place going to have to be different?

dk-crazydiv commented 1 week ago

Yes the VAD interrupt works normally in all the pipecat supported transports as far as I could tell. The only problem with websocket is that interrupt got caught on backend services but if some audio frames are already delivered to a frontend html/js client, by default websocket implementation didn't contain any mechanism to forward that interrupt signal to the frontend.

The makefix solution I used to workaround this issue was to inherit WebsocketServerOutputTransport and write my own hook to send the data in process_frame.

async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)
        if isinstance(frame, StartInterruptionFrame):
                # INTERRUPT WEBSOCKET HANDLING
                text_frame = TextFrame(f"TYPE:INTERRUPT---MESSAGE:USER_STARTED_SPEAKING")
                proto = self._params.serializer.serialize(text_frame)
                await self._websocket.send(proto)

Then parse this TextFrame on frontend to clear the audiobuffer.