Closed EMUNES closed 1 month ago
Yes, there are many things suboptimal about this video player. It really comes down to two problems:
So, to improve the performance, you would need to do the following:
vulkanh264dec
plugin, for instance)iced_wgpu
doesn't offer a way to do this. Nonetheless, I don't see why implementing this in Iced wouldn't be trivial by adding a Texture
variant in the custom primitives here.That should net you close to the best performance possible.
Now I know about them, I will try those optimizations. Thanks for your reply! It solves my puzzles.
@jazzfool, according to README.md, it looks like some of the performance issues have been fixed (commit e347a9b3249d33d9c9e40bfb8b149c60490e09d8):
- Decent performance. Skips a lot of the overhead from Iced
Image
and copies frame data directly to a WGPU texture, and renders using a custom WGPU render pipeline. For a very subjective reference, I can play back 1080p HEVC video with hardware decoding without hitches, in debug mode.
Can you confirm this? Is the minimal example using hardware decoding by default? Are there any other performance issues still to be aware of?
Anyway this repository is of great help, thanks!
From my testing, yes, the performance is a lot better since I last wrote. My earlier points still stand to squeeze out more performance but currently with hardware decoding (which seems to be working by default now) it's very usable.
Thanks @jazzfool, I am also able to run it with hardware decoding now but I still face some performance issues. I leave here the walkthrough and results of some tests.
Whether hardware acceleration is used or not it now depends entirely on gstreamer system setup, its plugins and the underlying video driver.
By enabling gstreamer logs I could see that I was using avdec_h264
decoder, that to my understanding is a software decoder:
$ GST_DEBUG=2,videodecoder:INFO cargo run --example minimal
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Running `target/debug/examples/minimal`
0:00:00.102594958 9109 0x785ee0000f10 INFO videodecoder gstvideodecoder.c:1631:gst_video_decoder_sink_event_default:<avdec_h264-0> upstream tags: taglist, video-codec=(string)"H.264\ /\ AVC", container-specific-track-id=(string)1, bitrate=(uint)5152237;
...
I was testing this on PC with Nvidia GTX 1060 GPU, Arch linux, nvidia drivers 550.54.14. HW acceleration should be handled by NVDEC/NVENC codec, handled by the nvcodec
gstreamer plugin.
However by inspecting the nvcodec
plugin with gst-inspect-1.0 nvcodec
I could not see any feature listed.
This because I was missing the cuda
package. After that I had to clean gstreamer cache too:
$ rm -r ~/.cache/gstreamer-1.0
After this gst-inspect was properly showing encoder and decoder features:
$ gst-inspect-1.0 nvcodec
Plugin Details:
Name nvcodec
Description GStreamer NVCODEC plugin
Filename /usr/lib/gstreamer-1.0/libgstnvcodec.so
Version 1.24.0
License LGPL
Source module gst-plugins-bad
Documentation https://gstreamer.freedesktop.org/documentation/nvcodec/
Source release date 2024-03-04
Binary package Arch Linux GStreamer 1.24.0-1
Origin URL https://www.archlinux.org/
cudaconvert: CUDA colorspace converter
cudaconvertscale: CUDA colorspace converter and scaler
cudadownload: CUDA downloader
cudaipcsink: CUDA IPC Sink
cudaipcsrc: CUDA IPC Src
cudascale: CUDA video scaler
cudaupload: CUDA uploader
nvautogpuh264enc: NVENC H.264 Video Encoder Auto GPU select Mode
nvautogpuh265enc: NVENC H.265 Video Encoder Auto GPU select Mode
nvcudah264enc: NVENC H.264 Video Encoder CUDA Mode
nvcudah265enc: NVENC H.265 Video Encoder CUDA Mode
nvh264dec: NVDEC H.264 Decoder
nvh264enc: NVENC H.264 Video Encoder
nvh265dec: NVDEC H.265 Decoder
nvh265enc: NVENC HEVC Video Encoder
nvjpegdec: NVDEC jpeg Video Decoder
nvjpegenc: NVIDIA JPEG Encoder
nvmpeg2videodec: NVDEC mpeg2video Video Decoder
nvmpeg4videodec: NVDEC mpeg4video Video Decoder
nvmpegvideodec: NVDEC mpegvideo Video Decoder
nvvp9dec: NVDEC VP9 Decoder
21 features:
+-- 21 elements
and the minimal test was then using the nvh264dec
hardware decoder:
GST_DEBUG=2,videodecoder:INFO cargo run --example minimal
Finished dev [unoptimized + debuginfo] target(s) in 0.12s
Running `target/debug/examples/minimal`
0:00:01.314568227 10681 0x77b5fc000f10 INFO videodecoder gstvideodecoder.c:1631:gst_video_decoder_sink_event_default:<nvh264dec0> upstream tags: taglist, video-codec=(string)"H.264\ /\ AVC", container-specific-track-id=(string)1, bitrate=(uint)5152237;
...
However despite using the hardware decoder there is still some performance issue, because I get about 80% CPU in both tests with and without hardware decoding but if I directly play the video with gstreamer:
gst-launch-1.0 playbin uri=file:///$(pwd)/.media/test.mp4
it says it uses nvh264dec decoder and takes about 20%.
@jazzfool , would you expect such a difference? Is it due to what you were referring in your point above?
If hardware decoding is used (which on my system it is not, but this can probably be fixed by enabling the VA-API GStreamer plugin), then you face the issue of getting the image memory to somewhere that Iced can render it. It's quite a performance nightmare in terms of GPU-host synchronization, host-visible memory, image layout transitions, etc.
The same video played by mpv takes about 10% cpu and it also says it uses nvdec hw decoder.
On an another PC with an Intel Celeron N3350 (Dual core CPU and Intel HD Graphics 500) I had to install intel-media-driver
and gst-plugin-va
which allows gstramer to use VA-API which should use the intel underlying hw coded.
After this the minimal example started to use the hardware codec vah264dec, however performance were even worse in this case, higher CPU usage (~130% compared to ~100% without hw decoder) and video lagging (there was no lagging without hw decoder).
I think that in this case there is also something wrong with gstreamer itself because even if I directly play the video with gstreamer:
gst-launch-1.0 playbin uri=file:///$(pwd)/.media/test.mp4
it says it's using hw decoder vah264dec but I still face high CPU usage (~80%) and video lagging.
However if I play the same video with mpv:
mpv --hwdec=auto .media/test.mp4
it still says it's using hardware decoding (vaapi, so it should be the same underlying codec) but I only have 20% CPU usage and a smooth video playback.
I also tried to force using the vulkan decoder by setting the env var GST_PLUGIN_FEATURE_RANK=vulkanh264dec:MAX
but on both machine it fails during initialization, even tough gst-ispect-1.0 vulkan
shows the decoder, and so gstreamer fallbacks to the other available decoders.
Thanks for the detailed tests! Yes, I would expect greater CPU usage, based on my earlier points. MPV and gst-launch almost certainly skip the CPU overhead by keeping everything on the GPU when using hw decoding. However, I do not expect to see a difference on the order of 20% vs 80%. I tested MPV and gst-launch vs the minimal example in release build, and saw closer to 2-3% vs 5-6%.
I expect there's something going on with gstreamer and the system configuration to cause such a big difference - perhaps something with how the gstreamer sink pipeline is setup. It's hard to reproduce this myself as I would want to capture some profiles and look at what gstreamer is doing in more detail.
I have noticed a significant performance gap between OpenGL Renderer provided by gstreamer plugin autovideosink and our iced video player. The latter has higher latency and consumes more resources than OpenGL Renderer. Is it possible to close up the gap?
I found a small bug which should improve performance slightly. However, regarding the performance overall, I did find the source of the issue as to why performance ends up being slower than e.g., gst-launch:
Video frames are usually encoded in a YUV colour space, to help with spatial compression. The problem is that converting YUV to RGBA is not a simple operation, and in this case is being performed on the CPU (by the 'videoconvert' plugin). Now that wouldn't really be a problem if we could just accept the frames in YUV then place it into a YUV WGPU texture (NV12) so that the conversion can be done on the GPU - but... NV12 textures need the NV12 feature gate when creating the WGPU device, and Iced does not let us select features we want.
With GPU colour space conversion I anticipate that CPU usage% would drop by roughly 15-20% (from my local tests). The rest of the CPU usage comes from write_texture
(i.e., copy CPU memory to GPU texture) and I see no simple way to reduce that.
Looking into the future, the biggest leap forward would be https://github.com/gfx-rs/wgpu/issues/2330, but there's no sign of that feature any time soon.
I found a small bug which should improve performance slightly. However, regarding the performance overall, I did find the source of the issue as to why performance ends up being slower than e.g., gst-launch:
Video frames are usually encoded in a YUV colour space, to help with spatial compression. The problem is that converting YUV to RGBA is not a simple operation, and in this case is being performed on the CPU (by the 'videoconvert' plugin). Now that wouldn't really be a problem if we could just accept the frames in YUV then place it into a YUV WGPU texture (NV12) so that the conversion can be done on the GPU - but... NV12 textures need the NV12 feature gate when creating the WGPU device, and Iced does not let us select features we want.
With GPU colour space conversion I anticipate that CPU usage% would drop by roughly 15-20% (from my local tests). The rest of the CPU usage comes from
write_texture
(i.e., copy CPU memory to GPU texture) and I see no simple way to reduce that.Looking into the future, the biggest leap forward would be gfx-rs/wgpu#2330, but there's no sign of that feature any time soon.
Thanks for your reply. I profiled my pipeline and color conversion did make a huge part of running time(around 80% of total time on my M1 Pro MacBook). By the way, I am just curious why you address codec in WGPU is a biggest leap, so is your point that if codec in WGPU achieved then we can discard gstreamer pipeline then eliminate unnecessary memory copy?
That's right. If WGPU implements the native video decoding extensions for each API then that would result in almost no overhead since the memory doesn't need to move anywhere. The next best thing would be if WGPU implemented external memory extensions (VK_KHR_external_memory or equivalent) so that decoding is done in e.g., OpenGL but the texture memory can be imported as e.g., a VkImage.
For now I may investigate compute shaders as an alternative for speeding up the YUV -> RGB conversion.
That's right. If WGPU implements the native video decoding extensions for each API then that would result in almost no overhead since the memory doesn't need to move anywhere. The next best thing would be if WGPU implemented external memory extensions (VK_KHR_external_memory or equivalent) so that decoding is done in e.g., OpenGL but the texture memory can be imported as e.g., a VkImage.
For now I may investigate compute shaders as an alternative for speeding up the YUV -> RGB conversion.
But to my knowledge gstreamer appsink cannot return GPU memory(D3D, OpenGL, CUDA) so it seems that we need to rewrite almost the entire gstreamer pipeline in rust if we want to eliminate all unnecessary memory copy. It does sound like a hell of a work.
Referring to the external memory extensions, actually gstreamer does expose glimagesink
, vulkansink
, and d3d11videosink
. Of course in the interest of supporting interop with all WGPU backends you'd want to pick the most portable one. Whether throughout the gstreamer pipeline itself it internally decodes on the GPU is another question. Though to be honest, instead of using glimagesink
at that point I would consider switching entirely from gstreamer and instead to libmpv.
I have implemented hardware accelerated NV12 to RGB conversion that does not rely on the WGPU feature gate in 9d60f26.
With that, CPU usage has been reduced by around 30-40%. From my testing, the CPU usage is now comparable with other video players. At this point the only further CPU-side optimization that could be made is zero-copy frames (currently it copies from GPU to CPU to GPU), but without changes in wgpu that is not currently possible to avoid.
As such, I will be closing this issue.
Dear author, this repo is the only lead for me to study about video player in Iced with gstreamer. Thanks a lot for sharing!
It works but the video can be laggy if the video has a higher resolution like 1920 x 1080. So I wonder whether the problem is in appsink callback because of writing video data to the frame property, or it's iced having trouble to refresh it's Image from frame data.
As any of those tools seems to lack debugging facilities (can't find a way to debug gstreamer in rust). So I ask is there any idea to improve the performace of video playing based on your code.
I've updated gstreamer to 0.21 and iced to 0.10 in dependencies.