Open careyer opened 2 years ago
Benchmarks of h265 decoding latency on rpi 4 would be really interesting.
HEVC decoding latency is less than 1 frame at 1080p 60 FPS on the Raspberry Pi 4. Rendering will add about 1 additional frame of latency.
I recommend the using FFmpeg fork maintained by one of the Raspberry Pi developers here: https://github.com/jc-kynesim/rpi-ffmpeg (branch: dev/4.4/rpi_import_1
)
The build script that I use to build FFmpeg with the appropriate options is here: https://github.com/cgutman/moonlight-packaging/blob/master/scripts/build-deps.sh
What resolution and frame rate will you be streaming? That will determine whether you can use something like ffplay
or if you'll need to write something more specialized to render directly to a DRM plane. If you want to build ffplay, you'll need to add --enable-ffplay
to the configure
command line.
Thank you Cameron! That indeed points us into the right direction. I'll give it a go.
We will most likely go with: 720p/1080p at 30/60FPS or maybe 120fps (to cutdown decoding latecy as much as possible). Likely we are going to try streaming 4K as well since it would be the first solution to do so (and kicks DJIs [closed-source] but :-)
BTW: Is there a way to display the video playback with ffmpeg/ffplay also without a running x11-server? (We are using raspiOS-lite).
If you are interested you are invited to take a glimpse at: https://github.com/OpenHD/QOpenHD https://openhd.gitbook.io/open-hd/
Hello, @careyer
Thanks for asking this, Rpi HEVC decoder latency was an unknown to me. I am currently trying to stream a 1680x2100 HEVC SRT video feed to a Raspberry Pi 4 and there are so many uncontrolled variables, I was starting to suspect decoder latency was an insurmountable obstacle
Here is how I encode my stream from desktop capture on PC, and receive it on RPi4 using ffmpeg/ffplay
ffplay -f mpegts -x 1680 -y 2100 -left 0 -top 0 -an -noborder -alwaysontop -autoexit -flags low_delay -tune zerolatency -fflags discardcorrupt -preset ultrafast -framedrop -probesize 32 -analyzeduration 0 -sync ext srt://:6666?mode=listener
ffmpeg -f gdigrab -framerate 30 -video_size 1680x2100 -offset_x 1920 -offset_y -525 -i desktop -vcodec hevc_nvenc -pix_fmt yuv420p -f mpegts -flags low_delay -fflags discardcorrupt -probesize 32 -analyzeduration 0 srt://raspberrypi.lan:6666
Here is what this looks like
https://www.youtube.com/watch?v=paaGYrZqbyY
https://www.youtube.com/watch?v=0VYCdCxi1Xg
ffmpeg consoles https://i.imgur.com/RDdSatR.png
Desktop https://i.imgur.com/TB1WmTB.png
And more discussion and questions on this reddit thread https://old.reddit.com/r/ffmpeg/comments/s14puo/raspberry_pi_4_dual_monitor_hevc_streaming_demo/
Thank you Cameron! That indeed points us into the right direction. I'll give it a go.
We will most likely go with: 720p/1080p at 30/60FPS or maybe 120fps (to cutdown decoding latecy as much as possible). Likely we are going to try streaming 4K as well since it would be the first solution to do so (and kicks DJIs [closed-source] but :-)
I'm not sure if the HEVC decoder and rendering performance is high enough to do 120 FPS or 4K, but 1080p 60 FPS is certainly achievable.
BTW: Is there a way to display the video playback with ffmpeg/ffplay also without a running x11-server? (We are using raspiOS-lite).
Avoiding X11 is actually ideal because rendering directly with DRM APIs has far better performance. You can find an example of how to do that here: https://github.com/moonlight-stream/moonlight-qt/blob/645040c9438361223a0fb9b7bc4d983f95471dce/app/streaming/video/ffmpeg-renderers/drm.cpp#L305
I think you will probably need to write your own tool to call the FFmpeg APIs directly to get the performance you need, similar to how Moonlight does it. You will also need to use patched FFmpeg source like what I linked in my previous comment to use the HEVC decoder on the Pi 4.
Hello, @careyer
Thanks for asking this, Rpi HEVC decoder latency was an unknown to me. I am currently trying to stream a 1680x2100 HEVC SRT video feed to a Raspberry Pi 4 and there are so many uncontrolled variables, I was starting to suspect decoder latency was an insurmountable obstacle
Here is how I encode my stream from desktop capture on PC, and receive it on RPi4 using ffmpeg/ffplay
ffplay -f mpegts -x 1680 -y 2100 -left 0 -top 0 -an -noborder -alwaysontop -autoexit -flags low_delay -tune zerolatency -fflags discardcorrupt -preset ultrafast -framedrop -probesize 32 -analyzeduration 0 -sync ext srt://:6666?mode=listener
ffmpeg -f gdigrab -framerate 30 -video_size 1680x2100 -offset_x 1920 -offset_y -525 -i desktop -vcodec hevc_nvenc -pix_fmt yuv420p -f mpegts -flags low_delay -fflags discardcorrupt -probesize 32 -analyzeduration 0 srt://raspberrypi.lan:6666
Here is what this looks like
https://www.youtube.com/watch?v=paaGYrZqbyY
https://www.youtube.com/watch?v=0VYCdCxi1Xg
ffmpeg consoles https://i.imgur.com/RDdSatR.png
Desktop https://i.imgur.com/TB1WmTB.png
And more discussion and questions on this reddit thread https://old.reddit.com/r/ffmpeg/comments/s14puo/raspberry_pi_4_dual_monitor_hevc_streaming_demo/
I suspect the majority of your latency is on the host-side or within ffmpeg/ffplay itself. Does latency stay similar if you change codecs to H.264 and/or use software encoding rather than NVENC? Is latency similar if you play the stream on a PC instead of a Raspberry Pi?
I highly doubt you will be able to get the latency you need out of ffmpeg/ffplay. If your ultimate goal is to do screen capture, I recommend taking a look at how Sunshine does it to get low latency. You will probably need a purpose-built tool using the FFmpeg C APIs to reach your latency goals.
@cgutman Thank you for the suggestion of sunshine, I somehow missed it existed despite searching for software like this for several weeks now !
And I just installed it on my PC and raspberry pi and it just blew out of the water absolutely everything else I have tried. Right now I have it on a single screen and it's got scaling problems (black border all the way around the entire image) and despite this scaling, the latency is superb. Not sure exactly how much but it feels sub-150ms , maybe even sub-100ms !!
Now I just need to convince it to try and stream two monitors as a single stream!
Thanks !
@shodanx2 could you share how you got it working on your side? I compiled ffmpeg but when trying to play a 2160p video I get the following error: (cannot allocate memory)
@careyer
Try fresh raspios bullseye installation, it worked out of the box for me
However I have not tried 4k video only 1080p and my dual monitor "1680x2100"
I see you are trying to output to framebuffer, I'd like to try that too however I have not tried.
In my setup, I was running on xorg with the v4c-kms-v3d driver (default), I didn't try the v4l2 decoder although I might have been using it without realizing. Ffplay is very cryptic about what hardware acceleration it is or isn't using.
From the performance I have seen so far, I would be surprised if the RPi4 can really decode 4k60fps h.265
(note I fixed the black border issue mentionned above by changing the output resolution from 720p to "native", strange it didn't try native by default !)
Got H.265 playback working... it is blazing fast. :-) 720p ~100fps on fbdev | raw decoding ~ 130fps 1080p ~60fps on fbdev 4K ~20fps on fbdev
That's great could you post your ffmpeg command line and how you got it to work ?
You are outputting to fbdev, is that fbdev using DRM/DRI or is it just a software framebuffer ?
Do you know of a way to output to framebuffer to both of the raspberry pi hdmi output at once ?
here you go:
sudo ffmpeg -hwaccel drm -i hevc-input-file.mp4 -pix_fmt bgra -an -f fbdev /dev/fb0
in /boot/config.txt you need to add (otherwise you will run into a memory allocation error):
dtoverlay=rpivid-v4l
dtoverlay=cma,cma-size=402653184
#comment out this line: dtoverlay=vc4-kms-v3d
I think writing to fbdev is going to be pretty inefficient. I haven't looked at the FFmpeg code but I doubt it's a zero-copy path.
You should try to use the DRM APIs (drmModeSetPlane()
) to render instead. I think you can just call drmModeSetPlane()
for both CRTC IDs to render on both displays.
@cgutman can you give advice on how to accomplish using the DRM APi render together with FFmpeg? I am not a code developer in the first place. Is there another output device which can be used or a program that the output can be pipe to (similar to omxplayer - but which is deprecated for arm64 now)? All we need is a simple video player to display the decoded video.
Your help is much appreciated! Thanks!
Ohhh! And another quick question that comes to mind: Will hwaccell only work in 64bit bullseye or also 32bit? Buster?
@cgutman can you give advice on how to accomplish using the DRM APi render together with FFmpeg? I am not a code developer in the first place. Is there another output device which can be used or a program that the output can be pipe to (similar to omxplayer - but which is deprecated for arm64 now)? All we need is a simple video player to display the decoded video.
It looks like the rpi-ffmpeg project I linked earlier has a vout_drm
and vout_egl
available. If possible, I'd suggest drm_vout
as that is most likely to have the best performance. See https://github.com/jc-kynesim/rpi-ffmpeg/issues/9 for examples.
Your help is much appreciated! Thanks!
Ohhh! And another quick question that comes to mind: Will hwaccell only work in 64bit bullseye or also 32bit? Buster?
HEVC works in all scenarios (once you enable it). MMAL is only officially supported in 32-bit OSes using the Fake KMS driver.
@cgutman thank you very much! It is a pleassure to work with you.
Indeed vout_drm
(vout_egl
is not) and vout_rpi
are available. However I cannot get them working. Both obviously require to specify an output file but I cannot find any doumantation of what needs to be provided here:
Do you have any idea? If I provide -f vout_drm /dev/null
it gets a step further but gives me:
Sorry for stealing your precious time one this but my google kung-fu has not helped my out on this one. Any ideas?
@careyer
Have you tried the various DRI devices in /dev/dri/by-path/ ? Like /dev/dri/card0
@shodanx2 no not yet... i can test that (but I doubt this will work). All examples I find arround the internet all specify /dev/null
... I have no idea why that is. Maybe it is some sort of debug output? That stuff is documented just nowhere. :-/
That "Format not DRM_PRIME" error is important. It means whatever decoder you're using isn't outputting DRM_PRIME frames that can be rendered by vout_drm
. Please provide the full command-line you're using and attach the full error log.
I finally got it working
export LD_LIBRARY_PATH=/usr/local/lib/
ffmpeg -no_cvt_hw -hwaccel drm -stream_loop 0 -i /home/pi/Videos/bbb_h264_1080p_30fps.mp4 -f vout_drm /dev/null
(I still got no idea though why I have to export that library first... if I don't do it I run into a library not found issue).
H265 playback performance is bindblowing:
This might be interesting for you guys, too: https://github.com/jc-kynesim/hello_drmprime/issues/3
I finally got it working
export LD_LIBRARY_PATH=/usr/local/lib/ ffmpeg -no_cvt_hw -hwaccel drm -stream_loop 0 -i /home/pi/Videos/bbb_h264_1080p_30fps.mp4 -f vout_drm /dev/null
(I still got no idea though why I have to export that library first... if I don't do it I run into a library not found issue).
H265 playback performance is bindblowing:
- 720p -> 730fps(!!)
- 1080p -> 420fps(!!)
- 2K -> 200fps(!!)
- 4K -> 128fps(!!)
How did you manage to get those results? I am currently running bullseye on RPI4 with the same ffmpeg build. On 4K videos i barely get 80fps with gpu @900MHz
Would be interesting to try with the same video file to compare results
Thank you, I am really hoping to open a 1680x2100 window across both hdmi outputs on RPi4 which will be a SRT streaming listener as a video source. Exciting stuff !
As extra monitors over ethernet for my windows PC
Hi guys,
congratulations on this wonderful project. Your achievements are awesome! Kudos!
Please allow me an off-topic question:
I work on another project which is all about live-streaming footage from airborne vehicles (project name: OpenHD) at highres and low latency.
Would you guys be willing to share some information on how to use hardware accelerated HEVC / H.265 decoding on the Pi4? As of now we use jetson-nano for high performance / low latency H.265 encoding in the air but for cost-savings and better sourceability we would really like to introduce hardware based H.265 decoding on a ground-based Pi4.
Any information and/or hints to the right direction on what decoder/player to use would be very very welcome!
Thanks in advance and keep up the excellent work. Cheers!