pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
2.81k stars 202 forks source link

class DailyTransport(BaseTransport) How to enable the bot to screen share or send a chat message while we are talking to the user #299

Open geekofycoder opened 1 month ago

geekofycoder commented 1 month ago

Hello. A question has been revolving in my mind.I wanted to display the contents that are getting collected by the bot(The information it is gathering while talking to the user).Is there any way to show the file through screen sharing or chat messages?

Screen share is preferable as we want to display images or pdf.But we can also adjust to the chat part where we can share a simple json.

geekofycoder commented 1 month ago

can we use or tweak something start screen_share or send_messages?

wtlow003 commented 1 month ago

Interested to figure if this is possible as well!

aconchillo commented 1 month ago

There are a couple of ways to do this:

Does this help?

LoJunKai commented 1 month ago

Hi, may I just ask if there is a service that converts TextFrame to TransportMessageFrame or do we have to build our own service for it? Otherwise, what could be done here? Thanks!

geekofycoder commented 4 weeks ago

There are a couple of ways to do this:

  • The easy one is to send a TransportMessageFrame. For example, in the case of the DailyTransport you can send a DailyTransportMessageFrame. On the client side you would receive it as an app message https://docs.daily.co/reference/daily-js/events/participant-events#app-message
  • A more complicated way would be to generate an image and send it to the client. The DailyTransport has camera support, the only thing you need to do is create the DailyTransport with camera_out_enabled=True and then push an ImageRawFrame. When the transport output gets an image frame it will send it as if it was a webcam.

Does this help?

Like we can create the same thing as done with AudioRawFrame and push it to
await llm.push_frame(sounds["ding2.wav"], FrameDirection.DOWNSTREAM) For image we have to create a list then pass it to audiostream likewise or any other method to follow

`sounds = {}
sound_files = [
    "clack-short.wav",
    "clack.wav",
    "clack-short-quiet.wav",
    "ding.wav",
    "ding2.wav",
]

script_dir = os.path.dirname(__file__)

for file in sound_files:
    # Build the full path to the sound file
    full_path = os.path.join(script_dir, "assets", file)
    # Get the filename without the extension to use as the dictionary key
    filename = os.path.splitext(os.path.basename(full_path))[0]
    # Open the sound and convert it to bytes
    with wave.open(full_path) as audio_file:
        sounds[file] = AudioRawFrame(audio_file.readframes(-1),
                                     audio_file.getframerate(), audio_file.getnchannels())`
geekofycoder commented 4 weeks ago

any example for passing the image to the frame @aconchillo

aconchillo commented 4 weeks ago

Here's one example that shows how to read an image file into an ImageRawFrame

https://github.com/pipecat-ai/pipecat/blob/main/examples/simple-chatbot/bot.py#L45-L46

geekofycoder commented 4 weeks ago

this sure gives an headstart let me integrate this and if it works I will close the issue.Thanks for the guidance @aconchillo

geekofycoder commented 4 weeks ago

@aconchillo It tends to run continuously without ending.I have to close my vscode to stop it.Here as in https://github.com/pipecat-ai/pipecat/blob/ffc157de65e998a00dffc8a21c9abba53deaf954/examples/patient-intake/bot.py#L137 Here the notification sound is pushed into LLM.similarly I tried to push my image to llm frame.I have to push one image,show I just appended one image to the list, await llm.push_frame(sprites[0],FrameDirection.DOWNSTREAM) "sprites is the list of image of rawframe like the dictionary of sounds in your code example" but a continuous error of unsupported image mode is being generated and it causes it to run forever.

aconchillo commented 3 weeks ago

Yes. Since underneath we have WebRTC, Pipecat is sending frames at a specific framerate camera_out_framerate (from DailyTransportParams). So, even if you set a single image that image will be sent continuously. The reason is because WebRTC deals with realtime-video, so we are emulating that.

The image that you load needs to be a raw RGB image.