Open wn2000 opened 4 months ago
Please check whether there is a fd leak on app. Check the opened fd in app thread is increasing or not. The most likely issue is that the fd is leaking on running. If fd increaces above 1024 then kernel driver will generate translation error.
Thanks for the suggestion. I will keep an eye on the fd count.
One more test I did, is to switch the game preview video to software decoding and only use mpp on the background video.
That seems to have stabilized the app. It's been running for over 24 hours without an issue.
Do I need to apply some thread locking when two threads are accessing mpp at the same time? They have separate mpp contexts.
No need to lock between different mpp context. All mpp contexts are all indepent to each other.
I re-enabled mpp on both videos and monitored the opened fd count.
Within an hour the error appeared again. But the fd count is not high (<100).
I did notice that before the translate reg address failed
error, there were these errors in dmesg:
...
[142488.097327] rk-vcodec ff360000.rkvdec: can not find 3348 buffer in list
[142488.157297] rk-vcodec ff360000.rkvdec: can not find 3345 buffer in list
[142488.157912] rk-vcodec ff360000.rkvdec: can not find 3345 buffer in list
[142488.158523] rk-vcodec ff360000.rkvdec: can not find 3289 buffer in list
[142488.159121] rk-vcodec ff360000.rkvdec: can not find 3289 buffer in list
[142488.159749] rk-vcodec ff360000.rkvdec: can not find 3296 buffer in list
[142488.160346] rk-vcodec ff360000.rkvdec: can not find 3296 buffer in list
[142488.160974] rk-vcodec ff360000.rkvdec: can not find 3345 buffer in list
[142488.161578] rk-vcodec ff360000.rkvdec: can not find 3345 buffer in list
[142488.226703] rk-vcodec ff360000.rkvdec: can not find 3364 buffer in list
[142488.227305] rk-vcodec ff360000.rkvdec: can not find 3364 buffer in list
...
But when those errors pop up, the videos both play fine.
It is at some point, when the rk_vcodec: reg_init:1248: error: translate reg address failed, dumping regs
error happens, one of the videos would choke, while the other video still plays fine. Then when it's time for the other video to open a new file to play, that video would choke too. At that point, only a reboot can fix the problem.
It is strange though if I only use one video, there's no issue at all (tested for 2 days nonstop).
it is obvious a buffer leak. There is too many buffer in the list There may be buffer leak error on file switch. When one file goes to the end the decoder should input a eso packet and wait the eos ourput frame then exit.
Ahhh ok. That makes sense. I will check the code. Thanks!
One more question: When this happens, is there a way to "reset" the rkvdec device to a good state?
Currently, even if I kill the app and relaunch, it still gets stuck.
Close all decoder instance and free all buffer. cat /sys/kernel/debug/dma_buf/bufinfo to check any buffer remain in the list.
Awesome thanks! So I killed the app, which is the only app that uses the rkvdec device.
But when I do cat /sys/kernel/debug/dma_buf/bufinfo
, I still get numerous buffers that look like the following:
...
00004096 00000002 00000007 00000008 drm
Attached Devices:
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
Total 8 devices attached
00012288 00000002 00000007 00000004 drm
Attached Devices:
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
ff360000.rkvdec
Total 4 devices attached
...
How do I "free" them?
Check the buffer's holder. It may be hold in both display process and rkvdec. Then the decoder work flow need to be checked for unreleased MppFrame
https://github.com/user-attachments/assets/0133709e-89ad-408a-ad8a-07df74480f74
I just realized that the issue is not due to playing two videos simutaneouly. It's actually caused by some particular videos.
Attached is one of such problematic videos.
For other "good" videos, when I do cat /sys/kernel/debug/dma_buf/bufinfo |grep 'Total .* devices'
, I get
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
...
which looks reasonable. And after exiting the app those dmabufs are gone.
However, when playing the attached video, I got:
Total 4 devices attached
Total 1 devices attached
Total 3 devices attached
Total 6 devices attached
Total 2 devices attached
Total 3 devices attached
Total 4 devices attached
Total 6 devices attached
Total 4 devices attached
Total 4 devices attached
Total 3 devices attached
Total 2 devices attached
Total 3 devices attached
Total 4 devices attached
Total 2 devices attached
Total 5 devices attached
Total 5 devices attached
Total 3 devices attached
Total 5 devices attached
Total 2 devices attached
Total 4 devices attached
Total 2 devices attached
Total 2 devices attached
Total 26 devices attached
Total 3 devices attached
Total 16 devices attached
Total 7 devices attached
Total 35 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 9 devices attached
Total 1 devices attached
Total 6 devices attached
Total 15 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 1 devices attached
Total 0 devices attached
And after exiting the app those dmabufs are still there.
Can you see what's unique about the attached video causing those dmabufs to leak? The video plays fine otherwise.
I just checked the above video with mpi_dec_test
, and got the same leaking result.
Here is the raw h264 stream:
derbyoc2.zip
Does that mean the issue is actually in mpp?
@HermanChen Hi just wonder if you could reproduce the resource leak when playing the above video clip, or it's just my setup? Thanks!
Are you using the MPP develop branch?
Yea using the develop branch. Tried with mpi_dec_test. And observed the dma_buf leak using that particular h264 stream. Other streams work just fine so don't know what's unique about that one.
You can upload the following files so that I can confirm the version information of your platform mpp: libmpp.so libvpu.so
kernel: kernel/drivers/video/rockchip/vpu kernel/drivers/video/rockchip/vcodec
Hi. Here is the mpp lib. It was built from this repository's develop brunch. librockchip_mpp.so.0.zip
I do not use the librockchip_vpu.so library (the application works fine without that library).
For the kernel, I do not have the source or development package. I'm using the stock system provided by the device manufacturer, and only run my own user-space application on top of it.
The uname -a
output is:
4.4.159 #4 SMP Mon Jun 12 09:45:25 CST 2023 aarch64 GNU/Linux
The board is RK3328.
Is there any other info I could provide to help troubleshoot the problem? I guess you were not able to reproduce the resource leak using the h264 file I uploaded?
Yes, I cannot reproduce the issue using the file you uploaded
Platform: RK3328 running a buildroot based Linux. Kernel: 4.4.159
I'm working on a game selection frontend app that plays two simultaneous videos: a background video and a game preview. Both cycle through a list of video files.
The Issue:
The app works fine initially, but randomly when one video tries to open a new file, it gets stuck. The other video continues playing normally.
Error Message:
dmesg
shows the following error when this happens:Impact:
Questions:
Any insights or suggestions to fix this behavior or gather more information would be greatly appreciated!