ossrs / oryx

Oryx(SRS Stack) is an all-in-one, out-of-the-box, and open-source video solution for creating online video services, including live streaming and WebRTC, on the cloud or through self-hosting.
https://ossrs.io/oryx
MIT License
448 stars 97 forks source link

Support object detections by GPT-4o #182

Closed winlinvip closed 1 month ago

winlinvip commented 1 month ago

adolfus — 05/08/2024 11:42 PM Can Tensorflow object detections be sent to SRS from a Python script so that the object detections can be viewed in a live and VOD stream?

tagore — 05/09/2024 10:06 PM like ads detection, black screen detection, freeze , or objects?

Winlin — 05/10/2024 9:11 AM Interesting.

tagore — 05/10/2024 11:30 PM it would be very usefull in many uses cases, tv cable operators for example ..

Winlin — 05/11/2024 11:02 AM What's your product? How does. your user use it? Sorry, I don't understand how tv cable operator with python tensorflow object and live stream.

tagore — 05/13/2024 9:14 PM because we using iptv internally to distribute signals, also, the tv satellite encoders/decoders do network output and also ASI output ... so, we can use system to detect black screens o freeze, o comercial ads detection, using the network part. We receive the channel via satellite with a receiving device that has IP and ASI output. That network output can be used to monitor the channel, whether it freezes, loses frames, starts a commercial or ends.

Winlin — 05/14/2024 3:45 PM How do you convert the IPTV signal into IP and ASI, and then send it to the SRS server using the RTMP protocol?

tagore — 05/14/2024 9:08 PM satellite -- satellite receptor ( they have udp multicast output and ASI output ) -- ffmpeg (RMTP or SRT ) -- SRS and also have channels comming from another city, using ffmpeg to SRS and the ffmpeg again to inject in the analog coaxial part of the network. here they have hardware equipment that can recieve udp multicast and the inject in the coaxial network. i need to check hardware names if needed.

Winlin — 05/15/2024 at 9:51 AM If you can use FFmpeg to convert to RTMP, then you can also use FFmpeg to capture screenshots as PNG images. Generally, OpenCV supports image analysis.

Of course, FFmpeg + OpenCV is a classic technical solution. I think the recent GPT-4o has great potential for application in detecting and summarizing audio and video content. We will consider implementing this scenario in Oryx. I have created a new task for this, but we have a lot of work and no clear deadline, no ETA.

GPT-4o is a new flagship model that can reason across audio, vision, and text in real time. See https://openai.com/index/spring-update/

Image

I tried GPT-4o's image text recognition, and the accuracy is 100%. The accuracy has significantly improved compared to GPT-4, and the response time is only half as long.

Image

Image

winlinvip commented 1 month ago

Fixed in v5.15.3