steveseguin / raspberry_ninja

Publish or capture VDO.Ninja streams with Python (Raspberry Pi, Linux, Mac, Windows WSL)
https://raspberry.ninja
151 stars 30 forks source link

Much larger latency when recording to fdsink than when using the browser #34

Open jcelerier opened 1 year ago

jcelerier commented 1 year ago

Hi, I'm streaming some video from a Pi (with HD camera), pi zero 2 W with aarch64 debian bookworm base so gstreamer 1.22 and kernel 6.1.21. I'm using the following command from the sending side:

publish.py --streamid $STREAMID --libcamera --noaudio --nored --noqos --width 640 --height 480 --framerate 30 --bitrate 1000 --rpi --h264 

it works fairly well with very low latency from browser from the receiver side - I tried chrome & firefox. Now I'm trying the same thing but without any web browser involved (all this is on a decently powerful laptop, archlinux, kernel 6.5, gstreamer 1.22 too and ffmpeg 6):

python3 ./publish.py  --h264 --noqos --nored --noaudio --zerolatency --fdsink $STREAMID | ffplay -f rawvideo -pixel_format bgr24 -video_size 640x480 -i - 

and that gives me pretty high latency, around 4/5 seconds. What could be the cause of this? I also tried to pipe to NDI but to no avail - I don't even see the ndi element being created on the network (I have gst-plugin-ndi installed)

success video?
0:00:06.884514136 161643 0x7f2d8c0010b0 WARN                 basesrc gstbasesrc.c:3132:gst_base_src_loop:<nicesrc0> error: Internal data stream error.
0:00:06.884522275 161643 0x7f2d8c0010b0 WARN                 basesrc gstbasesrc.c:3132:gst_base_src_loop:<nicesrc0> error: streaming stopped, reason not-linked (-1)
0:00:06.884551062 161643 0x7f2d8c0010b0 WARN                   queue gstqueue.c:992:gst_queue_handle_sink_event:<queue0> error: Internal data stream error.
0:00:06.884554859 161643 0x7f2d8c0010b0 WARN                   queue gstqueue.c:992:gst_queue_handle_sink_event:<queue0> error: streaming stopped, reason not-linked (-1)
steveseguin commented 1 year ago

I took a glance at the code, and I have the buffer on the output set to "unlimited". This is probably the issue, and you'll probably want to cap the buffer size to match your needs.

ie: queue max-size-buffers=0 max-size-time=0

For example, you can try to limit the buffer to 1 or 2 frames instead, using the following code instead, and that might tighten things up - it might also mean though there could be some dropped frames.

queue max-size-buffers=2 leaky=downstream

lines here, depending on codec used: https://github.com/steveseguin/raspberry_ninja/blob/19416a41e726d4f1a20039c0c0edaa87d24e01bf/publish.py#L540C29-L540C29

II have the output as BGR format as well, which you don't have to use if you don't want to. It's setup for OpenCV I think as is, but if you don't intend to use that you might be able to remove the videoconvert parts as well, and perhaps reduce the CPU overhead a bit.

like, this might be optional for your use case: videoconvert ! video/x-raw,format=BGR. The output might be YUV by default otherwise? not sure tho.

Another thing you might be able to do is to use the hardware h264 video decoder, instead of openh264, which might reduce the CPU overhead some more, speeding things up -- however I've not tried using the hardware decoder yet, so not 100% sure. Openh264 is pretty fast and reliable.

There could be some other things, like jitter buffer tweaks, or using a lower resolutions, but that probably won't be a big difference.

When it comes to the FFMpeg side, it might need optimizations as well, but that's outside the scope of where I can help I think.

jcelerier commented 1 year ago

Thanks for your very quick answer!

I tried changing the queue parameters or even removing the queue altogether but I am still seeing a huge latency, even when going all the way to removing all the queues and conversion (tried step by step of course :)):

rtph264depay ! h264parse ! openh264dec !  fdsink

Also tried to add a rtpjitterbuffer latency=60 element before.. no change.

Note that my CPU usage on the receiving side is hovering around 3-5 % overall so I don't think it's the limiting factor.

I also had other latency issues with gstreamer: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1261 maybe there are some hints in this issue?

A comment mentions somewhere: You should call gst_bin_recalculate_latency() on the pipeline when receiving a GST_MESSAGE_LATENCY message on the bus. and I saw it's not done in publish.py - could be worth investigating? (and it would get rid of the message spam too)

jcelerier commented 1 year ago

On the ffmpeg side I added some low-latency options I could find (https://superuser.com/questions/1776901/streaming-video-over-udp-with-ffmpeg-h264-low-latency) - most likely some are more relevant to mpeg and not to RAW streaming but still, didn't change anything :

ffplay  -fflags nobuffer -flags low_delay -probesize 32 -analyzeduration 1 -strict experimental -framedrop -f rawvideo -pixel_format yuv420p -video_size 640x480 -i -
steveseguin commented 1 year ago

when using the --framebuffer option, which is near identical to fdsink, the latency is quite low: https://youtu.be/LGaruUjb8dg?t=2395

There is a higher latency when I have the queue's buffer large, but with a frame buffer size of 1 or 2 frames, it's near instantly.

I am using an Orange Pi 5 in this demo though, which is quite a bit more powerful than a Zero 2 though.

If a lower resolution helps, there could still be a CPU issue. Perhaps try 360p or lower instead, as 3 to 5% CPU seems really rather low, especially if decoding video with openh264 in software. &quality=2 on the sender side will limit to 360p30.

If issues persist, I might need to duplicate your setup a rpi here.

jcelerier commented 1 year ago

To be clear: when I stream from the pi to a web browser, the latency is perfect, certainly around 100ms - ish, so I don't think the Pi is the issue. The problem is only with publish.py (on the same machine that manages low-latency from the browser, which is a powerful laptop).

I'll try with framebuffer, thanks!

jcelerier commented 1 year ago

Hmm I've been trying different things and managed to get less lag - for instance when adjusting the viewer pipeline to queue ! rtph264depay ! h264parse ! openh264dec ! xvimagesink it gets closer to expected, but it is still slower than what I'm seeing in chrome - I will try to stream a clock to see what's the delta

steveseguin commented 1 year ago

I've added --latency 200 as an option; the gstreamer default is 200 (milliseconds), so you can try lowering to 70 to see if that helps you out.

There might be 30ms here or there I can further squeak out, but not entirely sure yet.

jcelerier commented 1 year ago

it's incredibly better! I managed to take it down to much lower here with some frame drops (but I'm mainly interested in getting the lowest possible latency), it really feels real-time even from the Pi Zero

however, after 30 seconds I see :

NO HEARTBEAT
STOP PIPE
DATA CHANNEL: CLOSE
DATA CHANNEL: CLOSE

at the same time both on my sender and receiver and it cuts suddenly.

see around the 40 second mark - there is no particular lag before the crash so the "NO HEARTBEAT" is surprising:

https://github.com/steveseguin/raspberry_ninja/assets/2772730/f848d579-6346-465c-ae3b-576ac22b47fc

steveseguin commented 1 year ago

Im hoping when I get my new Raspberry Pi image out, this issue will go away.

I was having some issues with the system freezing after a little while, sometimes after just a few seconds, but it went away when I used some older and more stable builds of gstreamer, as well as the 64bit bullseye. Both were causing me a lot of issues.

anyways, I shoudl be able to get the new pi image out shortly; just trying to polish things

jcelerier commented 1 year ago

okay, for reference the Pi is running gst 1.22 on debian bookworm (with the official RPi OS manually updated)