[FEATURE] Optimizing Video Processing Workflow to Implement Pass-Through Functionality

boolw commented 6 months ago

Is your feature request related to a problem? Please describe. In the current video processing workflow, the mismatch between input and output video encoding formats leads to unnecessary encoding and decoding processes, resulting in resource wastage. To optimize the workflow, the following improvement plan is proposed:

Adjust the video processing workflow to ensure that both input and output video encoding formats are consistent, such as using the H264 encoding format.
Implement a pass-through functionality to avoid unnecessary encoding and decoding processes, thereby enhancing processing efficiency and saving resources.
Modify the existing code by adjusting the element connection order and adding necessary elements to support the pass-through functionality.
Conduct testing and validation to ensure that the optimized video processing workflow functions correctly and meets the expected outcomes.

Describe the solution you'd like The request is to configure the client's video encoding format as H264 and set the Egress output configuration to also use the H264 encoding format. When processed through GStreamer, the aim is to achieve pass-through functionality, where only simple data copying occurs to avoid unnecessary encoding and decoding processes.

Additional context We are willing to contribute to the development of this feature and look forward to your feedback on this proposal. We hope to collaborate with the Egress official team to collectively optimize the video processing workflow, improving system performance and efficiency.

davidzhao commented 6 months ago

@boolw it would be great to have a pass through mode for ParticipantEgress and TrackCompositeEgress.

I think you've outlined the right steps. When we considered this in the past, these points were tricky to get right

WebRTC video is not fixed in resolution. The publisher could publish a lower bitrate/fps whenever it runs into upstream congestion. So the pipeline would need to be able to handle this
When muxing audio and video together, it's not guaranteed that both sources are producing content consistently. For example, a user could turn off their camera or mic at any time.
While it's fine to start with H.264, VP8, VP9, and AV1 are also codecs that we'd like to support in the future. So during design we should avoid hardcoding codecs in places.

boolw commented 6 months ago

@davidzhao First of all, thank you very much for the reply from the Egress team. When we use the ParticipantEgress mode of the SDK, when we encounter a requirement for one video and multiple audios, we hope that there will be only one video, and the configured input and output video codec formats are consistent. (Of course, H.264 is just an example.) Video can achieve pass-through and improve efficiency. Of course, as you replied, there are various complex situations. You can try to see the feasibility of this solution. Of course, if you use the current Egress code, you can implement the pass-through function through simple configuration or how to modify the code. If you have further guidance or help, we will be happy to try and verify the method you provided. It is hoped that this optimization solution can be successfully implemented and improve system performance and efficiency.

boolw commented 6 months ago

@davidzhao Of course, the resolution of WebRTC video is not fixed. Whenever upstream congestion is encountered, the publisher can publish lower bitrate/fps. This situation has indeed been missed by me. I wonder if gstreamer has any solutions or suggestions for handling this situation. We can discuss it further.

davidzhao commented 6 months ago

Variable resolution should not pose a challenge. both MP4 and MKV supports dynamic resolution streams. I think it just limits the kind of containers that can be used.

momenthana commented 6 months ago

I was also looking for a way to fix this constantly. I merged the two egress tracks using the command below to merge audio.ogg and video.mp4 with different timestamp starting points in aws lambda. Video only copies the data stream and only audio in ogg format is encoded in aac to match the mp4 container.

ffmpeg -ss 00:00:00.000 -t 60 -i audio.ogg -ss 00:00:00.000 -t 60 -i video.mp4 -vcodec copy audio+video.mp4 -y

Although encoding time was drastically reduced to handle audio only, there were limitations to Lambda's runtime limitations, downloading and exporting all files, and the size of temporary repositories.

Resolving this requires the process of encoding and merging audio only in egress, which is responsible for output in principle...

momenthana commented 6 months ago

I need a solution that can reduce the cost of streaming and exporting data.

@boolw stream of frames that do not overlap at the end of each egress track is to adjust to the -ss option or to adjust the frame that remains at the end by equalizing the output length with the -t option. (It is hard to trust such a long stream or a large stream to encode in lambda. I read the timestamp of the json metadata file in egress that comes out when egress the track.)

@davidzhao still a single track that is output as non-fragment MP4, please review if you can add the work of encoding and merging only audio streams or finally interpolating data by encoding only audio to livekit egress.

momenthana commented 6 months ago

@boolw it would be great to have a pass through mode for ParticipantEgress and TrackCompositeEgress.

I think you've outlined the right steps. When we considered this in the past, these points were tricky to get right

WebRTC video is not fixed in resolution. The publisher could publish a lower bitrate/fps whenever it runs into upstream congestion. So the pipeline would need to be able to handle this

When muxing audio and video together, it's not guaranteed that both sources are producing content consistently. For example, a user could turn off their camera or mic at any time.

While it's fine to start with H.264, VP8, VP9, and AV1 are also codecs that we'd like to support in the future. So during design we should avoid hardcoding codecs in places.

That's a great! I only thought of WHIP and OBS output, so I was thinking of the above idea.

livekit / egress

[FEATURE] Optimizing Video Processing Workflow to Implement Pass-Through Functionality #655