Open jcelerier opened 1 year ago
I took a glance at the code, and I have the buffer on the output set to "unlimited". This is probably the issue, and you'll probably want to cap the buffer size to match your needs.
ie: queue max-size-buffers=0 max-size-time=0
For example, you can try to limit the buffer to 1 or 2 frames instead, using the following code instead, and that might tighten things up - it might also mean though there could be some dropped frames.
queue max-size-buffers=2 leaky=downstream
lines here, depending on codec used: https://github.com/steveseguin/raspberry_ninja/blob/19416a41e726d4f1a20039c0c0edaa87d24e01bf/publish.py#L540C29-L540C29
II have the output as BGR format as well, which you don't have to use if you don't want to. It's setup for OpenCV I think as is, but if you don't intend to use that you might be able to remove the videoconvert parts as well, and perhaps reduce the CPU overhead a bit.
like, this might be optional for your use case: videoconvert ! video/x-raw,format=BGR
. The output might be YUV by default otherwise? not sure tho.
Another thing you might be able to do is to use the hardware h264 video decoder, instead of openh264, which might reduce the CPU overhead some more, speeding things up -- however I've not tried using the hardware decoder yet, so not 100% sure. Openh264 is pretty fast and reliable.
There could be some other things, like jitter buffer tweaks, or using a lower resolutions, but that probably won't be a big difference.
When it comes to the FFMpeg side, it might need optimizations as well, but that's outside the scope of where I can help I think.
Thanks for your very quick answer!
I tried changing the queue parameters or even removing the queue altogether but I am still seeing a huge latency, even when going all the way to removing all the queues and conversion (tried step by step of course :)):
rtph264depay ! h264parse ! openh264dec ! fdsink
Also tried to add a rtpjitterbuffer latency=60 element before.. no change.
Note that my CPU usage on the receiving side is hovering around 3-5 % overall so I don't think it's the limiting factor.
I also had other latency issues with gstreamer: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1261 maybe there are some hints in this issue?
A comment mentions somewhere: You should call gst_bin_recalculate_latency() on the pipeline when receiving a GST_MESSAGE_LATENCY message on the bus.
and I saw it's not done in publish.py - could be worth investigating? (and it would get rid of the message spam too)
On the ffmpeg side I added some low-latency options I could find (https://superuser.com/questions/1776901/streaming-video-over-udp-with-ffmpeg-h264-low-latency) - most likely some are more relevant to mpeg and not to RAW streaming but still, didn't change anything :
ffplay -fflags nobuffer -flags low_delay -probesize 32 -analyzeduration 1 -strict experimental -framedrop -f rawvideo -pixel_format yuv420p -video_size 640x480 -i -
when using the --framebuffer option, which is near identical to fdsink, the latency is quite low: https://youtu.be/LGaruUjb8dg?t=2395
There is a higher latency when I have the queue's buffer large, but with a frame buffer size of 1 or 2 frames, it's near instantly.
I am using an Orange Pi 5 in this demo though, which is quite a bit more powerful than a Zero 2 though.
If a lower resolution helps, there could still be a CPU issue. Perhaps try 360p or lower instead, as 3 to 5% CPU seems really rather low, especially if decoding video with openh264 in software. &quality=2 on the sender side will limit to 360p30.
If issues persist, I might need to duplicate your setup a rpi here.
To be clear: when I stream from the pi to a web browser, the latency is perfect, certainly around 100ms - ish, so I don't think the Pi is the issue. The problem is only with publish.py (on the same machine that manages low-latency from the browser, which is a powerful laptop).
I'll try with framebuffer, thanks!
Hmm I've been trying different things and managed to get less lag - for instance when adjusting the viewer pipeline to queue ! rtph264depay ! h264parse ! openh264dec ! xvimagesink
it gets closer to expected, but it is still slower than what I'm seeing in chrome - I will try to stream a clock to see what's the delta
I've added --latency 200
as an option; the gstreamer default is 200 (milliseconds), so you can try lowering to 70 to see if that helps you out.
There might be 30ms here or there I can further squeak out, but not entirely sure yet.
it's incredibly better! I managed to take it down to much lower here with some frame drops (but I'm mainly interested in getting the lowest possible latency), it really feels real-time even from the Pi Zero
however, after 30 seconds I see :
NO HEARTBEAT
STOP PIPE
DATA CHANNEL: CLOSE
DATA CHANNEL: CLOSE
at the same time both on my sender and receiver and it cuts suddenly.
see around the 40 second mark - there is no particular lag before the crash so the "NO HEARTBEAT" is surprising:
https://github.com/steveseguin/raspberry_ninja/assets/2772730/f848d579-6346-465c-ae3b-576ac22b47fc
Im hoping when I get my new Raspberry Pi image out, this issue will go away.
I was having some issues with the system freezing after a little while, sometimes after just a few seconds, but it went away when I used some older and more stable builds of gstreamer, as well as the 64bit bullseye. Both were causing me a lot of issues.
anyways, I shoudl be able to get the new pi image out shortly; just trying to polish things
okay, for reference the Pi is running gst 1.22 on debian bookworm (with the official RPi OS manually updated)
Hi, I'm streaming some video from a Pi (with HD camera), pi zero 2 W with aarch64 debian bookworm base so gstreamer 1.22 and kernel 6.1.21. I'm using the following command from the sending side:
it works fairly well with very low latency from browser from the receiver side - I tried chrome & firefox. Now I'm trying the same thing but without any web browser involved (all this is on a decently powerful laptop, archlinux, kernel 6.5, gstreamer 1.22 too and ffmpeg 6):
and that gives me pretty high latency, around 4/5 seconds. What could be the cause of this? I also tried to pipe to NDI but to no avail - I don't even see the ndi element being created on the network (I have gst-plugin-ndi installed)