rockchip-linux / mpp

Media Process Platform (MPP) module
468 stars 156 forks source link

RK3588S Mali-G610 GPU (Orange Pi 5) video playback segfault #356

Open martivo opened 1 year ago

martivo commented 1 year ago

Hardware: Orange Pi 5 RK3588S Mali-G610 GPU OS: Armbian Linux Linux (5.10.110-rockchip-rk3588 #trunk.0248 SMP Fri Feb 10 05:25:40 UTC 2023 aarch64 aarch64 aarch64)

When updating to latest version of librockchip_mpp.so.0 the video playback no longer works. I have traced down the cause to this commit 1cc1af1b08423364e7fa50c92fedcb983e2c01a7

When I remove commit 1cc1af1b08423364e7fa50c92fedcb983e2c01a7 code changes from latest commit a05b01d84f4d4f9aab84d183be286e40c92fa2d5 and build librockchip_mpp.so.0 then video playback is normal.

Syslog message when segfault happens. This is with the commit 1cc1af1b08423364e7fa50c92fedcb983e2c01a7.

Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 1 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 3 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 4 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->allocator failed at get_group:902
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->alloc_api failed at get_group:903
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->allocator failed at get_group:902
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->alloc_api failed at get_group:903
amazingfate commented 1 year ago

You need to set the right permission. Here is the udev rules I use: https://github.com/amazingfate/rockchip-multimedia-config/blob/main/99-rk-device-permissions.rules

martivo commented 1 year ago

My previous udev rules:

KERNEL=="mpp_service", MODE="0660", GROUP="video"
KERNEL=="rga", MODE="0660", GROUP="video"
KERNEL=="system-dma32", MODE="0666", GROUP="video"
KERNEL=="system-uncached-dma32", MODE="0666", GROUP="video" RUN+="/usr/bin/chmod a+rw /dev/dma_heap"

After changing my existing udev rule to https://github.com/amazingfate/rockchip-multimedia-config/blob/main/99-rk-device-permissions.rules

root@loovsys:~# ls -l /dev/dma_heap/
total 0
crw------- 1 root root  251, 4 Feb 13 09:06 cma
crw------- 1 root root  251, 5 Feb 13 09:06 cma-uncached
crw------- 1 root root  251, 0 Feb 13 09:06 system
crw-rw-rw- 1 root video 251, 1 Feb 13 09:06 system-dma32
crw-rw-rw- 1 root video 251, 2 Feb 13 09:06 system-uncached
crw-rw-rw- 1 root video 251, 3 Feb 13 09:06 system-uncached-dma32
root@loovsys:~# ls -l /sys/class/dma_heap/
total 0
lrwxrwxrwx 1 root root 0 Feb 13 08:56 cma -> ../../devices/virtual/dma_heap/cma
lrwxrwxrwx 1 root root 0 Feb 13 08:56 cma-uncached -> ../../devices/virtual/dma_heap/cma-uncached
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system -> ../../devices/virtual/dma_heap/system
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-dma32 -> ../../devices/virtual/dma_heap/system-dma32
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-uncached -> ../../devices/virtual/dma_heap/system-uncached
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-uncached-dma32 -> ../../devices/virtual/dma_heap/system-uncached-dma32

The segfault goes away BUT the playback is horribly choppy and barely plays. Before it was perfect playback(same video file).

During playback dmesg shows:

[  173.764467] rga: request[727] submit failed!
[  173.824646] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.824668] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.824674] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.824679] rga_mm: job buffer map failed!
[  173.824682] rga_mm: src channel map job buffer failed!
[  173.824687] rga_mm: failed to map buffer
[  173.824691] rga_job: rga_job_commit: failed to map job info
[  173.824703] rga_job: request[728] task[0] job_commit failed.
[  173.824707] rga_job: rga request commit failed!
[  173.824710] rga: request[728] submit failed!
[  173.883175] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.883192] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.883196] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.883199] rga_mm: job buffer map failed!
[  173.883203] rga_mm: src channel map job buffer failed!
[  173.883206] rga_mm: failed to map buffer
[  173.883212] rga_job: rga_job_commit: failed to map job info
[  173.883222] rga_job: request[729] task[0] job_commit failed.
[  173.883226] rga_job: rga request commit failed!
[  173.883229] rga: request[729] submit failed!
[  173.939808] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.939823] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.939827] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.939830] rga_mm: job buffer map failed!
[  173.939833] rga_mm: src channel map job buffer failed!
[  173.939837] rga_mm: failed to map buffer
[  173.939842] rga_job: rga_job_commit: failed to map job info
[  173.939852] rga_job: request[730] task[0] job_commit failed.
[  173.939857] rga_job: rga request commit failed!
[  173.939859] rga: request[730] submit failed!

The board has 16Gb of memory. (only 2-3Gb is in use)

mpv log during the slow playback:

 (+) Video --vid=1 (*) (h264 800x600 8.000fps)
 (+) Audio --aid=1 (*) (aac 6ch 44100Hz)
[vo/gpu/wayland] GNOME's wayland compositor lacks support for the idle inhibit protocol. This means the screen can blank during playback.
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
No video PTS! Making something up. Using 8.000000 FPS.
AO: [pulse] 44100Hz 5.1 6ch float
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
VO: [gpu] 800x600 yuv420p
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion

And when I remove the commit 1cc1af1b08423364e7fa50c92fedcb983e2c01a7 from latest build then playback is normal.

dmesg.log

amazingfate commented 1 year ago

I just did some research on it. The mpp code update makes the kernel check for rga MMU changes: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_mm.c#L409. You can limit the memory of the board to 4G to get rid of it. Or ask the developer to fix.

martivo commented 1 year ago

Is it possible to make mpp detect that this will not work on boards with more than 4Gb of memory and not use the feature the 1cc1af1b08423364e7fa50c92fedcb983e2c01a7 commit adds? Perhaps until this is fixed in the Kernel?

The playback is perfect on 16Gb or memory without this commit. A user defined setting would also solve the issue IMHO or perhaps do a permission check - in case it has no access to "system-uncached" then don't use it? (Seems to be the case I had before changin udev rules - only ending in segfault...).

I need more than 4Gb of memory - it is not an option to limit to 4Gb. I am sure there is others who want to use the hardware to the fullest.

amazingfate commented 1 year ago

A quick fix (just revert a part of the commit):

diff --git a/osal/allocator/allocator_dma_heap.c b/osal/allocator/allocator_dma_heap.c
index 7e3a637..fd0eff4 100644
--- a/osal/allocator/allocator_dma_heap.c
+++ b/osal/allocator/allocator_dma_heap.c
@@ -74,14 +74,14 @@ typedef enum DmaHeapType_e {
 } DmaHeapType;

 static const char *heap_names[] = {
-    "system-uncached",          /* 0 - default */
+    "system-uncached-dma32",    /* 0 - default */
     "cma-uncached",             /* 1 -                                      DMA_HEAP_CMA */
-    "system",                   /* 2 -                  DMA_HEAP_CACHABLE                */
+    "system-dma32",             /* 2 -                  DMA_HEAP_CACHABLE                */
     "cma",                      /* 3 -                  DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
-    "system-uncached-dma32",    /* 4 - DMA_HEAP_DMA32                                    */
-    "cma-uncached",             /* 5 - DMA_HEAP_DMA32                     | DMA_HEAP_CMA */
-    "system-dma32",             /* 6 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE                */
-    "cma",                      /* 7 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
+    "system-uncached",          /* 4 - DMA_HEAP_DMA64                                    */
+    "cma-uncached",             /* 5 - DMA_HEAP_DMA64                     | DMA_HEAP_CMA */
+    "system",                   /* 6 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE                */
+    "cma",                      /* 7 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
 };

 static int heap_fds[DMA_HEAP_TYPE_NB];

I think this issue is hard to solve because ffmpeg need to convert YUV420SP to YUV420P, and only rga2 kernel driver can do this. And rga2 is hardware limited to memory less than 4G. I don't know why the developer want to change the priority from 32bit to 64 bit. But that should have solved some issues for some other hardware.

rimonxu commented 1 year ago

also can calling rga api "imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);" to lock rga3 core. rga3 no limited to memroy less than 4G.

amazingfate commented 1 year ago

also can calling rga api "imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);" to lock rga3 core. rga3 no limited to memroy less than 4G.

Ffmpeg want to convert YCbCr_420_SP to YCbCr_420_P, which is not supported by the kernel driver: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_hw_config.c#L37

rimonxu commented 1 year ago

also can calling rga api "imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);" to lock rga3 core. rga3 no limited to memroy less than 4G.

Ffmpeg want to convert YCbCr_420_SP to YCbCr_420_P, which is not supported by the kernel driver: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_hw_config.c#L37

I know the limitations in RGA3... So when 420P is needed, I suggest using GPU instead of RGA2 in RK3588S, rga2 has low performance and other limitations..

martivo commented 1 year ago

A quick fix (just revert a part of the commit):

diff --git a/osal/allocator/allocator_dma_heap.c b/osal/allocator/allocator_dma_heap.c
index 7e3a637..fd0eff4 100644
--- a/osal/allocator/allocator_dma_heap.c
+++ b/osal/allocator/allocator_dma_heap.c
@@ -74,14 +74,14 @@ typedef enum DmaHeapType_e {
 } DmaHeapType;

 static const char *heap_names[] = {
-    "system-uncached",          /* 0 - default */
+    "system-uncached-dma32",    /* 0 - default */
     "cma-uncached",             /* 1 -                                      DMA_HEAP_CMA */
-    "system",                   /* 2 -                  DMA_HEAP_CACHABLE                */
+    "system-dma32",             /* 2 -                  DMA_HEAP_CACHABLE                */
     "cma",                      /* 3 -                  DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
-    "system-uncached-dma32",    /* 4 - DMA_HEAP_DMA32                                    */
-    "cma-uncached",             /* 5 - DMA_HEAP_DMA32                     | DMA_HEAP_CMA */
-    "system-dma32",             /* 6 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE                */
-    "cma",                      /* 7 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
+    "system-uncached",          /* 4 - DMA_HEAP_DMA64                                    */
+    "cma-uncached",             /* 5 - DMA_HEAP_DMA64                     | DMA_HEAP_CMA */
+    "system",                   /* 6 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE                */
+    "cma",                      /* 7 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
 };

 static int heap_fds[DMA_HEAP_TYPE_NB];

I can confirm with this change to latest "develop" branch the problem is not present. I created a fork for quick fix https://github.com/martivo/mpp/commit/38afa760be814dbbf32019b6c588be8304c1e486

I know the limitations in RGA3... So when 420P is needed, I suggest using GPU instead of RGA2 in RK3588S, rga2 has low performance and other limitations..

How could this be solved permanently? Where does the change have to take place?

jjm2473 commented 1 year ago

If you are using latest mpp library, there is a better solution to fix RGA2 4GB issue https://github.com/jjm2473/ffmpeg-rk/commit/7e350f94df5e3b68e6e1e588e31d90c4e67c7f32

diff --git a/libavcodec/rkmppdec.c b/libavcodec/rkmppdec.c
index ca7a824ac1bd..e2078c089936 100644
--- a/libavcodec/rkmppdec.c
+++ b/libavcodec/rkmppdec.c
@@ -249,7 +249,7 @@ static int rkmpp_init_decoder(AVCodecContext *avctx)
         goto fail;
     }

-    ret = mpp_buffer_group_get_internal(&decoder->frame_group, MPP_BUFFER_TYPE_DRM);
+    ret = mpp_buffer_group_get_internal(&decoder->frame_group, MPP_BUFFER_TYPE_DRM | MPP_BUFFER_FLAGS_DMA32);
     if (ret) {
        av_log(avctx, AV_LOG_ERROR, "Failed to get buffer group (code = %d)\n", ret);
        ret = AVERROR_UNKNOWN;

This patch will force MPP to output frames under dma32 address, so that RGA2 can handle it

shivabohemian commented 3 months ago

jjm2473/ffmpeg-rk@7e350f9

Hello, does gstreamer-rockchip also need to change this part of the code? The rk3568 also reports an "RGA_MMU unsupported memory larger than 4G" error. As shown in the figure below, does the mpp_buffer_group_get_external function also need to add this parameter?

截屏2024-03-09 14 29 24
jjm2473 commented 3 months ago

@shivabohemian They should all be the same as ffmpeg, but you need to confirm whether group or ext_group is used as the output buffer of MPP decoding. (Or just test it)

shivabohemian commented 3 months ago

@jjm2473 Thank you for your answer. I found some uploaded gstreamer-rockchip code on GitHub here . It seems that both group and ext_group buffers are used. However, I'm not proficient in C language. I will go and test it out, and your help is greatly appreciated.