pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI
BSD 2-Clause "Simplified" License
3.38k stars 322 forks source link

Capture participants screenshare in DailyTransportClient #544

Open mhar-andal opened 1 month ago

mhar-andal commented 1 month ago

Would be nice if you could capture the participants screenshare and feed the context to the LLM.

Shoshin23 commented 1 month ago

i would love this! this could be a great way to build a new copilot style assistants with such a feature.

@aconchillo/ @kwindla any plans of making this work with pipecat?

aconchillo commented 1 day ago

This is now possible in 0.0.48. All you need to do is:

@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
    await transport.update_subscriptions(
         participant_settings={participant["id"]: {"media": {"screenVideo": "subscribed"}}}
    )
    await transport.capture_participant_video(
        participant["id"], framerate=0, video_source="screenVideo"
    )

And it's actually pretty cool. You can try it in this example https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12b-describe-video-gpt-4o.py and see how it describes your screen.

You just need to open the Daily Room URL in your browser and share your screen. Feel free to close the issue if it works for you.

mhar-andal commented 7 hours ago

@aconchillo This is awesome! Although it seems to rate limit my chat completions requests.

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 429 Too Many Requests" INFO:openai._base_client:Retrying request to /chat/completions in 0.494679

The 0 framerate param doesn't send any images to openai, but increasing it to 1 makes it work for a minute or so until it openai rate limits. Any ideas on how to make this work with openai in a long session?

Also, does this work with the openai realtime API?