hbiyik / FFmpeg

PLEASE USE https://github.com/nyanmisaka/ffmpeg-rockchip REPO INSTEAD.
https://github.com/nyanmisaka/ffmpeg-rockchip
Other
90 stars 7 forks source link

Open Test Thread #14

Open hbiyik opened 11 months ago

hbiyik commented 11 months ago

@avafinger

A lot has been added specially hevc and vp8 encoders with scaling support:

https://github.com/hbiyik/FFmpeg/wiki

should be stable to test if you are interested.

nyanmisaka commented 11 months ago

Is there an option to force *_rkmpp_decoder outputting drm_prime hw frames?

hbiyik commented 11 months ago

FFMPEG_RKMPP_PIXFMT=DRMPRIME env value

nyanmisaka commented 11 months ago

HEVC/VP9/AV1 10-bit to H264/HEVC 8bit transcoding can cause system crashes. Sometimes I have to cut the power to reset.

FFMPEG_RKMPP_PIXFMT=DRMPRIME ./ffmpeg -c:v hevc_rkmpp_decoder -i hevc_10bit.mp4 -an -sn -c:v h264_rkmpp_encoder -rc_mode VBR -b:v 6M -maxrate 6M -bufsize 12M -profile:v high -level 4.1 -g:v 120 -f null -
[  771.797702] rk_vcodec: mpp_translate_reg_address:1838: reg[  0]: 0xffffffff fd -1 failed
[  771.797713] rk_vcodec: mpp_task_dump_mem_region:2025: --- dump mem region ---
[  771.797724] mpp_rkvenc2 fdbd0000.rkvenc-core: no memory region mapped
[  771.797737] rk_vcodec: mpp_process_task_default:630: alloc_task failed.
[  771.797747] rkvenc2_wait_result:1995: session 00000000736d7b74 pending list is empty!
[  771.797753] rk_vcodec: mpp_msgs_wait:1634: session 3 wait result ret -5
[  772.019465] rkvdec2_ccu_timeout_work:1643: fdc38100.rkvdec-core, task timeout
[  772.019515] rkvdec2_ccu_timeout_work:1643: fdc48100.rkvdec-core, task timeout
[  772.019586] mpp_rkvdec2 fdc48100.rkvdec-core: resetting...
[  772.019782] mpp_rkvdec2 fdc48100.rkvdec-core: reset done
[  772.019794] mpp_rkvdec2 fdc38100.rkvdec-core: resetting...
[  772.019896] mpp_rkvdec2 fdc38100.rkvdec-core: reset done

Also it seems the post RGA -width 1280 -height 720 doesn't accept 10-bit hw frames.

[h264_rkmpp_encoder @ 0xaaaae92c1dc0] Scaling is only supported for NV12,NV16,YUV420P,YUV422P. drm_prime requested
hbiyik commented 11 months ago

Can you try without forcing drmprime? You actaully dont neee to drm prime, softframes are 0 copy.

Scaling is only possible yuv420/422 p/sp planes, but i noticed when the outputframe is drmprime i dont check it correctly. Ill fix that...

nyanmisaka commented 11 months ago

You actaully dont neee to drm prime, softframes are 0 copy.

Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), nv12(tv, progressive), 1920x1080 But the encoder's input still suggests nv12, which usually implies non-zero-copy.

After removing the env, here comes the rga3 alignment issue or maybe the DMA32 buffer issue on my 16GB RAM board.

Stream mapping:
  Stream #0:5 -> #0:0 (hevc (hevc_rkmpp_decoder) -> h264 (h264_rkmpp_encoder))
Press [q] to stop, [?] for help
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] Decoder noticed an info change
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] 10bit NV15 plane will be downgraded to 8bit nv12.
rga_api version 1.8.1_[4]
err hs[0,1088,1080]
Error srcRect
[RgaBlit,782]Error srcRect

fd-vir-phy-hnd-format[12, (nil), (nil), 0, 8192]
rect[0, 0, 1920, 1088, 2816, 1080, 8192, 0]
f-blend-size-rotation-col-log-mmu[8192, 0, 0, 0, 0, 0, 1]
fd-vir-phy-hnd-format[20, (nil), (nil), 0, 2560]
rect[0, 0, 1920, 1088, 1920, 1088, 2560, 0]
f-blend-size-rotation-col-log-mmu[2560, 0, 0, 0, 0, 0, 1]
This output the user patamaters when rga call blit fail
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] RGA failed falling back to soft conversion
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] RGA failed to convert NV15 -> NV12. No Soft Conversion Possible
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] Failed set frame buffer (code = -1)
[hevc_rkmpp_decoder @ 0xaaaaf9f9af50] Decoder Failed to get frame (code = -1)
Error while decoding stream #0:5: Operation not permitted

And for the HEVC 8-bit 1080p input + post RGA downscaling to 720p. https://test-videos.co.uk/vids/bigbuckbunny/mp4/h265/1080/Big_Buck_Bunny_1080_10s_30MB.mp4

./ffmpeg -stream_loop -1 -c:v hevc_rkmpp_decoder -i Big_Buck_Bunny_1080_10s_30MB.mp4 -an -sn -c:v h264_rkmpp_encoder -rc_mode CBR -b:v 6M -maxrate 6M -bufsize 12M -profile:v high -level 4.1 -g:v 120 -width 1280 -height 720 -f null -
rga_api version 1.8.1_[4]
err hs[0,1088,1080]
Error srcRect
[RgaBlit,782]Error srcRect

fd-vir-phy-hnd-format[12, (nil), (nil), 0, 2560]
rect[0, 0, 1920, 1088, 2304, 1080, 2560, 0]
f-blend-size-rotation-col-log-mmu[2560, 0, 0, 0, 0, 0, 1]
fd-vir-phy-hnd-format[37, (nil), (nil), 0, 2560]
rect[0, 0, 1280, 720, 1280, 720, 2560, 0]
f-blend-size-rotation-col-log-mmu[2560, 0, 0, 0, 0, 0, 1]
This output the user patamaters when rga call blit fail
[h264_rkmpp_encoder @ 0xaaaaf24ec9e0] RGA failed falling back to soft conversion
[h264_rkmpp_encoder @ 0xaaaaf24ec9e0] Error applying Post RGA
hbiyik commented 11 months ago

Ok ill give it a look in detail tonight

hbiyik commented 11 months ago

the 2nd issue is quite weird.

rect[0, 0, 1920, 1088, 2304, 1080, 2560, 0]

the hstride is given 2304 here, and this is received from mpp directly. for this NV12 frame it should be ~1080, or with alignment 1088~ 1920, but wtf is 2304, and why mpp reports so, is interesting, could be an issue with mpp that i need to dig in.

Update: 1) issue with forcing DRMPRIME and having everything crashed, fixed in ed616697c931908a94f7b292701ad5d449ec4417 2) Issue with forcing DRMPRIME and scaling gives wrong limitation error, fixed in bf96d57b315e911f2a4ef240a0ea7c9520c0f2f8 3) Issue Big_Buck_Bunny_1080_10s_30MB.mp4 conversion crashes the rga: This is most likely an MPP bug, i understood completely why it is happening and i do not want to patch it from ffmpeg because obviously mpp is providing wrong stride, raised a bug issue https://github.com/rockchip-linux/mpp/issues/422, i expect a simple bugfix from mpp.

nyanmisaka commented 11 months ago

Thx for the update. With the fixes:

hbiyik commented 11 months ago

this should all work, however i switched to librga and it is being a bi**h, i am working on it. Anything else in the meanwhile??

hbiyik commented 11 months ago

ah one question, which player are you testing the drmprime output? is it kodi or something else?

nyanmisaka commented 11 months ago

I can't figure out why librga exists independently of mpp. This resulted in developers having to maintain compatibility between them.

sw frames seem to work fine in the encoder. Video quality looks better than AMD graphics cards, albeit slower.

I've been using the command line to test the encoder. Kodi might be a good option for testing ffmpeg as a library. Or refer to some tools of the author of rpi-ffmpeg such as https://github.com/jc-kynesim/hello_drmprime.

hbiyik commented 11 months ago

all of the issues you reported today and yesterday including the one i thought mpp related should be fixed in 1d57c70c3e29ec88304c6babcb533c43ce72c124

There is something fishy with mpp to my understanding however this was not the root cause of the issue.

About Sw frames, even when so-called non-drmprime planes are decoded with rkmpp_decoder they are still hardware planes most of the time, simply mmapped to drm device. Especially when the NV12 plane is used, there is no copy at all. So i would call them hybrid and in the transcoding scenarios they are mainly NV12, so hardware frames mapped to AVFrame. It is kinda tricky but this the actual reason why i started this fork in the first place.

In short, when transcoding forcing DRMPRIME does not make any difference in terms of performance, you can also verify this with the throughput and resource usage. This is the reason i did not test those parts with transcoding with DRMPRIME forced, so thanks for that :), lots of issues were found and fixed (hopefully). i had tested drmprime only when getting kmsgrab input with drmprime bgr0 frames.

nyanmisaka commented 11 months ago

Overall works great after the latest changes, with some exceptions.

It seems that my experience with desktop GPUs doesn't fully apply to Arm/Rockchip. As you said, forcing DRMPRIME does not improve performance. I also couldn't find any encoder preset to trade off between speed and quality. So the claimed encoding speed of 8k30 cannot be equivalent to single 4k120 or 1080p480. Maybe in parallel encoding it will work.

As for the HEVC encoder of rk3588, it doesn't support NV15 input, which means it cannot encode Main 10 profile/10-bit video, maybe they will add it in the next generation of HW. But for now it's best to remove it so as not to confuse users.

hbiyik commented 11 months ago

thanks for the detailed feedback, all is fixed at latest: cf6e1766c219aa52855c090b6f6d0699c77d25c2,

about profile 8.5: profile value 255 was so called profile 8.5 for hevc which is a special profile that enforces no limitation. I was sceptic about the h265 parser in mpp so i used that one, but now changed to default 0.

my understanding of this is that, only way to trade of speed is to reduce the input size. I think there is a constant pixel per cycle process rate. but i am also not sure, i have not yet benchmarked the encoder performance. And the best way to do it would be to first compare with mpi_enc_test things that come default with mpp.

open question: do you know any way to produce NV24, YUV444P, NV16, YUV422P, BGR24, YUYV422, UYVY422, BGRA, BGR0, NV12, YUV420P formatted DRM PRIME frames with ffmpeg so that i can push them to encoder.

currently i can only test NV12, NV16, YUV420P (rkmpp_decoder) & BGR0 (kmsgrab) drm prime frames and rest is not tested due to lack of input.

veldspar commented 11 months ago

alright, I checked out your git, I did modify the configure line a little from your wiki however:

./configure --enable-rkmpp --enable-version3 --enable-lib drm --enable-nonfree --enable-gpl --enable-version3 --enable-libx264 --enable-librtmp --enable-shared --enable-static --enable-libx265 --enable-libmp3lame --enable-libpulse --enable-openssl --enable-libopus --enable-libvorbis --enable-libaom --enable-libass --enable-libdav1d --enable-libx265 --enable-libvpx

I used that configure line because i also used it for jjm2473's fork of rkmpp enabled ffmpeg, so i have a comparison. the build went on clean, no errors from the first try on, however a warning during linking.

After build it complained about missing libavdevice.so.60 on first try(ffmpeg --encoder) - I found it in the libavdevice subfolder of your git pull. After that it comlained about all the rest of the common libs being missing one by one. simple guess would be that I didnt install ffmpeg. after adjusting my LD_LIBRARY_PATH for testing ffmpeg loads.

so far so good. ffmpeg -encoders | grep rk lists the h264, hevc and vp8 encoders ffmpet -decoders | grep rk lists h263, h264, hevc, mpeg1/2/4, vp8 and vp9 hardware decoders. I'm starting to like this.

Now lets give it a try - the goal is to trancsode. ./ffmpeg -i in.mkv -c:v hevc -c:a copy /extern/nn.hevc.mp4

and i get a segfault. same goes when trying to encode to h264

EDIT: I tried a clean build with the configure line from the git repos wiki, but i also keep getting segfault when trying to transcode a video, doesnt matter whether i transcode to h264 or hevc. Input in all cases has been h264 full hd

As for the linker warning, this is what i get: LD ffprobe_g /usr/bin/ld: /lib/aarch64-linux-gnu/libtirpc.so.3: warning: common ofrpc_createerr@@GLIBC_2.17' overridden by definition from /lib/aarch64-linux-gnu/libc.so.6 /usr/bin/ld: /lib/aarch64-linux-gnu/libtirpc.so.3: warning: common of rpc_createerr@@GLIBC_2.17' overridden by definition from /lib/aarch64-linux-gnu/libc.so.6 /usr/bin/ld: /lib/aarch64-linux-gnu/libtirpc.so.3: warning: common ofrpc_createerr@@GLIBC_2.17' overridden by definition from /lib/aarch64-linux-gnu/libc.so.6 STRIP ffplay `

Tiny attachment: I had installed a current mpp, however that got installed into /usr/local/lib and the system mpp was used instead. turns out your ffmpeg isnt compatible with the mpp that comes from radxas repo. a simple override using LD_LIBRARY_PATH fixed that issue, and the segfault is gone. Might be worth adding that to the wiki.

hbiyik commented 11 months ago

Tuned the statement in the wiki explicitly

Please use latest versions of those libraries, espcially mpp, and rga which are not very backwards compatible.

nyanmisaka commented 11 months ago

open question: do you know any way to produce NV24, YUV444P, NV16, YUV422P, BGR24, YUYV422, UYVY422, BGRA, BGR0, NV12, YUV420P formatted DRM PRIME frames with ffmpeg so that i can push them to encoder.

currently i can only test NV12, NV16, YUV420P (rkmpp_decoder) & BGR0 (kmsgrab) drm prime frames and rest is not tested due to lack of input.

This might require extra effort in libavutil/hwcontext_drm. Currently it is only a skeleton and does not contain a frame allocator unlike other hwcontexts. I remember someone have a patch for this and put it somewhere.

Once complete, the command line should look like this:

./ffmpeg -init_hw_device drm=dr:/dev/dri/renderD128 -filter_hw_device dr -f lavfi -i testsrc2=s=1280x720,format=bgra -vf hwupload,format=drm_prime -f null -

The ideal situation is that we can have a separate hwcontext_mpp as a sub-device of the hwcontext_drm.

nyanmisaka commented 11 months ago

Here it comes. https://github.com/Consti10/rv1126_ohd_sushi/blob/853fa1fc2d50f0e4f9b5eea71d1ff2657c9a2765/buildroot/package/ffmpeg/0005-hwcontext_drm-internal-frame-allocation.patch

hbiyik commented 11 months ago

amazing, that helps alot

hbiyik commented 11 months ago

hmm on a second review i am not so sure about this:

This patch gets the hstride from the the linesize of the picture descriptor, however to my experience and testing, ffmpeg and drm do not always have same defintiion hstride and plane count. https://github.com/Consti10/rv1126_ohd_sushi/blob/853fa1fc2d50f0e4f9b5eea71d1ff2657c9a2765/buildroot/package/ffmpeg/0005-hwcontext_drm-internal-frame-allocation.patch#L310C1-L311C1

ie: for ffmpeg AV_PIX_FMT_BGR0 is 1 plane format with hstride = 4 width vstride = height size = hstride vstride

but for drm it is 4 1 plane format with hstride = width vstride = height size = 4 hstride vstride

I have learned this by testing kmsgrab drmplane formar bgr0 https://github.com/hbiyik/FFmpeg/blob/cf6e1766c219aa52855c090b6f6d0699c77d25c2/libavcodec/rkplane.c#L356

May be it is better to look at libdrm in detail how the size and strides and planes are defined.

Update: After carefull examination, the hstride multiplier is based on the char_per_block definition rather than the plane size. so this might should work

nyanmisaka commented 11 months ago

https://github.com/torvalds/linux/blob/6eaae198076080886b9e7d57f4ae06fa782f90ef/drivers/gpu/drm/drm_fourcc.c#L199-L202

https://ffmpeg-devel.ffmpeg.narkive.com/aa7gstJy/patch-1-5-lavu-add-drm-hwcontext

hbiyik commented 11 months ago

ok i was confusing with plane count and char_per_block

veldspar commented 11 months ago

quick question - the -level option for the hevc encoder, is that like the presets from x265? if so, is lower level slower preset, aka better quality? sorry to bother in here, but i couldnt find anything about this on the web

nyanmisaka commented 11 months ago

Nope. It refers to coding constraints. There's no x265 CRF equivalent. https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding_tiers_and_levels

hbiyik commented 11 months ago

mean while one interesting feature could be this so called "split mode" to reduce the latency on live streaming apps. Any experience on it? Does it really make sense?

nyanmisaka commented 11 months ago

Sounds like it should help moonlight. Btw have you figured out the patch yet?

hbiyik commented 11 months ago

ah i see, yeah actually this cloud gaming thing is quite popular and hip lately, so latency could be a thing. I can not work on the patch, i will when i go to home in the evening,

nyanmisaka commented 11 months ago

The original author left a little flaw. This should save you some time.

0001-lavu-hwcontext_drm-Add-internal-frame-allocation.patch

FFmpeg cli works as expected but there are some issues with certain formats.

./ffmpeg -init_hw_device drm=dr:/dev/dri/renderD128 -filter_hw_device dr -f lavfi -i testsrc2=s=1280x720,format=yuv444p -vf hwupload,format=drm_prime -c:v h264_rkmpp_encoder -rc_mode VBR -b:v 4M -maxrate 4M -bufsize 8M -y /tmp/out.mp4

yuv444p_fail

hbiyik commented 11 months ago

So, thanks a lot for the patch...

After several fixes, NV12, NV16, NV24, BGR24, YUYV422, YVY422, BGRA,

YUV420P, YUV422P, YUV444P are not giving red channels, considering those are yuv formats, it is obviously something wrong with the source not the encoder. dont knwo what currently.

BGR0 also does not work, i see that the size of the DRM plane is 3/4 of what it is supposed to be, i think thats because last X or 0, part of the plane is not allocated to save space, but this is not ok for the MPP, if you give such strides with less the size, mpp will try to read non existent last quarter of the plane where there was suppossed to be bunch of 0s, buth it will segfault and crash the kernel. I dont know which one to blame but, considering kmsgrab is also giving full size plane with with bgr0 format, it seems like something with this generated drm primes is wrong.

But i could already test planar planes and bgr0 from other sources, so i think these test planes with this new patch is giving good enough coverage.

image

veldspar commented 11 months ago

so another issue - the build generally works fine, however I noticed when converting DVD Source material(I tried one source with a 352x576 SAR 24:11 and another 720x576 source) ffmpeg complains about duplicating frames. Sample out:

    encoder         : Lavc60.3.100 ac3
[vost#0:0/hevc_rkmpp_encoder @ 0x55a0eedb50] More than 1000 frames duplicatedts/s dup=981 drop=418 speed=15.9x    
[dvd @ 0x55a0eedea0] buffer underflow st=0 bufi=10966 size=20148ate= 753.7kbits/s dup=3275 drop=1367 speed=  16x    
[dvd @ 0x55a0eedea0] buffer underflow st=0 bufi=12990 size=20148
[dvd @ 0x55a0eedea0] buffer underflow st=0 bufi=15014 size=20148
[dvd @ 0x55a0eedea0] buffer underflow st=0 bufi=17038 size=20148
[dvd @ 0x55a0eedea0] buffer underflow st=0 bufi=19062 size=20148
[vost#0:0/hevc_rkmpp_encoder @ 0x55a0eedb50] More than 10000 frames duplicateds/s dup=9936 drop=3975 speed=13.6x    
^C[out#0/dvd @ 0x55a0dba0b0] Error writing trailer: Immediate exit requestedits/s dup=21884 drop=8468 speed=14.6x    
frame=51408 fps=314 q=-0.0 Lsize=  197120kB time=00:28:35.32 bitrate= 941.4kbits/s dup=21939 drop=8491 speed=10.5x    
video:177251kB audio:14838kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.619029%
Exiting normally, received signal 2.

this happens regardless of whether I try the iso image as source or the individual VOB files. The resulting file is a stuttering output that plays a couple frames, then hangs for a fraction of a second and then continues playing.

line used in above example was: ffmpeg -fflags +genpts -i dvd.iso -c:v hevc -b:v 1000k -c:a ac3 -rematrix_maxval 1.0 -ac 2 -f dvd out.mp4

this works fine in software on my desktop computer(I need to encode to h264 in both cases, your ffmpeg on rock 5 or my ffmpeg on desktop, for some reason when i choose hevc i get a non-playing video on this second video i tried and vlc claims its an mpeg-2 source stream, but that might be a bug upstream the ffmpeg pipe)

veldspar commented 11 months ago

Addendum: transcoding the stream to h264 results in a broken frame:

2023-07-25-140255_1920x1080_scrot

hbiyik commented 11 months ago

thanks for the feedback, can you provide the full log? i want to which decoder and encoder is initialized and how...

I assume this is mpeg4 decoder and its parser is quite picky about the pts (but i dont guess this is the problem here, something is weird with the strides). So if i download a test iso from the indernet i could reproduce this?

veldspar commented 11 months ago

here you go:

https://pastebin.com/BxUFVpM9

note that the first like 5 minutes of this particular dvd is only static images, aka no movement or blending involved, so this is within like the first few seconds of the output file. As a workaround i am trying to convert the dvd to an mpeg-4 stream on my desktop right now at crf-0 and then using that as input on the rock5 to create a hevc output. this is still working though...

hbiyik commented 11 months ago

could you also check withotu specifically requesting a soft decoder mpeg2video and post the full output. There is already hardware mpeg2 decoder.

ffmpeg -fflags +genpts -i dvd.iso -vf bwdif -c:v h264 -b:v 1000k -c:a ac3 -rematrix_maxval 1.0 -ac 2 -f dvd out.mp4

without specifiying -c:v mpeg2video

nyanmisaka commented 11 months ago

Further testing showed that the default value of -quality_min 50 in CBR/VBR/AVBR rc modes will cause the actual bps to exceed the target max bps by a large amount. Default it to 0 fixes the issue for me.

http://www.larmoire.info/jellyfish/media/jellyfish-10-mbps-hd-hevc-10bit.mkv

./ffmpeg -c:v hevc_rkmpp_decoder -i jellyfish-10-mbps-hd-hevc-10bit.mkv -an -sn -c:v hevc_rkmpp_encoder -rc_mode CBR -quality_min 50 -quality_max 100 -b:v 4M -maxrate 4M -bufsize 8M -vframes 500 -y /tmp/out.mp4

[hevc_rkmpp_encoder @ 0xaaaad9b76a00] Bitrate Target/Min/Max is set to 4000000/250000/4250000

// -quality_min 50
frame=  500 fps=159 q=-0.0 Lsize=   14205kB time=00:00:16.64 bitrate=6989.2kbits/s speed=5.29x

// -quality_min 0
frame=  500 fps=160 q=-0.0 Lsize=    8262kB time=00:00:16.64 bitrate=4065.1kbits/s speed=5.33x

I also find a quality option that says it only works in VBR. I tried it and didn't see any noticeable change, or maybe it didn't work.

typedef enum MppEncRcQuality_e {
    MPP_ENC_RC_QUALITY_WORST,
    MPP_ENC_RC_QUALITY_WORSE,
    MPP_ENC_RC_QUALITY_MEDIUM,
    MPP_ENC_RC_QUALITY_BETTER,
    MPP_ENC_RC_QUALITY_BEST,
    MPP_ENC_RC_QUALITY_CQP,
    MPP_ENC_RC_QUALITY_AQ_ONLY,
    MPP_ENC_RC_QUALITY_BUTT
} MppEncRcQuality;
veldspar commented 11 months ago

could you also check withotu specifically requesting a soft decoder mpeg2video and post the full output. There is already hardware mpeg2 decoder.

ffmpeg -fflags +genpts -i dvd.iso -vf bwdif -c:v h264 -b:v 1000k -c:a ac3 -rematrix_maxval 1.0 -ac 2 -f dvd out.mp4

without specifiying -c:v mpeg2video

I actually tried using the hardware-decoder first, i added the software decoder when trying to figure out what was wrong there. The log and result look pretty much the same

hbiyik commented 11 months ago

@veldspar do you remember the plane format reported by mpeg2_rkmpp_decoder? This should be relevant with the green strike unalignment in the picture. I dont have an environment to test it now, i am just trying to collect as much as info as possible to verify when i get the workspace.

@nyanmisaka actually quality_min and quality_max values are not quality but inverse quantization parameters (qp) in a meaningfull range (min/max qp). I just want to give an option to the user fine tune the encoder in detail. How it works is as follows:

The less the qp, the higher the quality, the more the qp, less the quality. And if the decrease in the qp is under a certain threshold, the quality gain is too less for the size increase you receive, same for vice-versa. So the relation is exponential.

H264/5 : QP=10 [Quality 100 <-> Quality 0] QP=51
VP8    : QP=40 [Quality 100 <-> Quality 0] QP=127
JPEG   : QP=1  [Quality 100 <-> Quality 0] QP=99 // For future

And it is set to 0 for quality_min, 100 for qaulity_max by default

https://github.com/hbiyik/FFmpeg/blob/d30c8c6181d5bdcff8d9a8ba37935fbedeb26a8a/libavcodec/rkmpp.h#L32

The bitrate is always ignored by the encoder whenever it does not fit the QP range. (Calcualted QP values are also shown on the encoder initialization)

CQP mode: The qp_min & qp_max is always fixed to quality_max (qp_min), therefore bitrate is always ignored. As the name suggests constant QP CBR mode: There is a little margin of the target bitrate 15/16 to 17/16 (%93 to %106) of the target bitrate. The default QP range is rich enough to cover most of scenarios. Therefore as long as qp values do not fit the target bandwidth values, qps are irrelevant. VBR/AVBR mode: target bitrate will be set with a margin of 1/16 to 17/16 (%6 to %106). So there is a big margin of target, and to achieve this encoder falls to areas where qp is not optimum, there optimization of the quality min max (actual qp values) would make sense.

Some references: https://github.com/hbiyik/FFmpeg/blob/d30c8c6181d5bdcff8d9a8ba37935fbedeb26a8a/libavcodec/rkmppenc.c#L160 https://github.com/hbiyik/FFmpeg/blob/d30c8c6181d5bdcff8d9a8ba37935fbedeb26a8a/libavcodec/rkmppenc.c#L154

MppEncRcQuality_e thing was never configured at all, i do not know how it works and i am not sure it is in operation a lot, because mpp_enc_test* examples of mpp also does not use it.

update: i had to edit the entry at least 5 times to make sense :)

veldspar commented 11 months ago

@veldspar do you remember the plane format reported by mpeg2_rkmpp_decoder? This should be relevant with the green strike unalignment in the picture. I dont have an environment to test it now, i am just trying to collect as much as info as possible to verify when i get the workspace.

I actually believe by now that the mpeg2 decoder is not at fault. as reported earlier, I reencoded the file on my desktop to a CRF-0 H264 file. The resulting 40gig video plays fine on vlc on my desktop, however when reencoding it to hevc I get garbage output again. The Attached screenshot shows on the left: my ffmpeg command i put in with all the media information, the input file at position 0:28 on the center right, and the output file at the very same position on the top right as well as the same command run on my desktop computer on the bottom right. (For clarification: Left Terminal: ffmpeg run on rock 5, right top: result from said command, right center: input file, right bottom: same ffmpeg encode on desktop pc)

if it was the mpeg2 encoder then the reencode from h264->hevc would have been fine. if you want, I can provide you with a sample of the particular file. screen

hbiyik commented 11 months ago

@veldspar i duplicated your issue, i understand whats happening but yet don't understand why it is happening. if you remove -vf bwdif it should work.

when this filter is applied the output is frame is NV12 but claimed to be YUV420P. Thats something i need to dig in.

UPDATE: i found the issue, there is something wrong in my plane size calculation. yet still should work without filter

UPDATE: i am officially having a brainfart here, this is gonna take some time to fix..

veldspar commented 11 months ago

take your time. this is already the most valuable addition to the rock5(rk3588 sbc) this year(IMHO) - btw, any idea on how to make your ffmpeg work on jellyfin?

nyanmisaka commented 11 months ago

Actually i’m the maintainer of jellyfin-ffmpeg. It will take some time to complete this. Also the subtitle overlay and HDR tonemap are not implemented yet. Both are not feasible to run in SW since they are just too slow.

veldspar commented 11 months ago

Great to have you here. Just my opinion, but the rock 5 is currently one of the best platforms for jellyfin, even without hardware-accelerated ffmpeg.(but being able to nicely transcode 1080p in hardware will be appreciated greatly) that just leaves jellyfin every now and then getting oomkilled on 8gigs of ram with very light use(memleak somewhere)

hbiyik commented 11 months ago

I have created a seperate issue for 720x480 issue at https://github.com/hbiyik/FFmpeg/issues/17. There something fundemantally wrong here, i am trying to get it but cant currently. It only happns when your input is soft frame, ie a decoder which is not rkmpp or a soft filter, and the output format of it must be aither yuv420p,yuv422p, bgr24. Details: https://github.com/hbiyik/FFmpeg/issues/17

hbiyik commented 11 months ago

Good news, fixed that 720x480 problem. Not pushing yet because i have made drastic changes, i want fully test before i push.

hbiyik commented 11 months ago

Fixed in https://github.com/hbiyik/FFmpeg/commit/eb34616ece746ae58030fd4601e802dacb0d3ee2. As always better to use latest.

nyanmisaka commented 11 months ago

After further testing I found that there are two ways to improve transcoding performance.

I get ~250fps in gst-launch (1080p->1080p) and 150-170fps in ffmpeg.

GST_VIDEO_CONVERT_USE_RGA=1 taskset -c 4-7 gst-launch-1.0 filesrc location=/tmp/video.hevc ! h265parse ! mppvideodec ! videoconvert ! mpph265enc rc-mode=vbr bps=60000000 bps-max=60000000 width=1920 height=1080 ! filesink location=/dev/null

Also it seems they figured out a way to not forcing DMA32 for RGA2.

hbiyik commented 11 months ago

@nyanmisaka

thanks for the feedback.

1) current encoder is also in async mode, therefore architecturally they have the same concepts. 2) Are you sure above command is scaling the video? you set it to 1920x1080. If the size is same as input may be it never visits RGA. 3) Side note, both gst and this one use SYNC RGA

i have not benchmarked this at all, so dont know how it performs.

I think the real benchmark could be against mpi_enc_test with a .yuv input file. That way we can idealize the impact of the performance.

extract a nv12 yuv file FFMPEG_RKMPP_PIXFMT=NV12 ffmpeg -i testfile.mp4 nv12.yuv

benchmark with ffmpeg ~150 fps mpp_cfg_debug=1 ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt nv12 -i nv12.yuv -c:v h264_rkmpp_encoder -profile 4 -y ffmpeg_nv12.h264

benchmark with mpi_enc_mt_test ~370fps :) mpp_cfg_debug=1 mpi_enc_mt_test -i nv12.yuv -o mpi_nv12.h264 -w 1920 -h 1080 -f 0 -t 7 -rc 1 -bps 6000000 -qc 10:10:30:10:30

it is very important to make the encoding parameters same because it impacts the speed of the encoder. for this mpp_cfg_debug=1 env value will spit the config to dmesg.

note: make sure you use a really fast drive, yuv files are big, the read speed might introduce bottleneck. i have tested with an nvme, so yeah there are definetely some performance left on the table

hbiyik commented 11 months ago

sidenote, i think bitrate is a more accurate measurand rather than the fps.

nyanmisaka commented 11 months ago
  1. Are you sure above command is scaling the video? you set it to 1920x1080. If the size is same as input may be it never visits RGA.

The source is 10-bit HEVC 1080p. RGA is still required before feeding the frame to the encoder.

So let's test pure encoder first. The source file is the classic 1080p BigBuckBunny, looped through it several times using ffmpeg to output 2000 frames. I put raw yuv in /tmp on my 5A. This shouldn't be a bottleneck for encoding.

# make 2000 frames nv12 yuv
radxa@rock-5a:~/workspace/FFmpeg$ ./ffmpeg -stream_loop -1 -c:v hevc_rkmpp_decoder -i Big_Buck_Bunny_1080_10s_30MB.mp4 -an -sn -vframes 2000 -y /tmp/2000frames_nv12.yuv

# test ~111.5 fps
radxa@rock-5a:/tmp$ time mpp_cfg_debug=1 mpi_enc_test -i /tmp/2000frames_nv12.yuv -o /dev/null -w 1920 -h 1080 -f 0 -t 7 -rc 1 -bps 6000000 -qc 10:10:30:10:30

real    0m17.941s
user    0m3.474s
sys     0m6.126s

# mt test ~468.4 fps
radxa@rock-5a:/tmp$ time mpp_cfg_debug=1 mpi_enc_mt_test -i /tmp/2000frames_nv12.yuv -o /dev/null -w 1920 -h 1080 -f 0 -t 7 -rc 1 -bps 6000000 -qc 10:10:30:10:30

real    0m4.270s
user    0m1.246s
sys     0m1.944s

# ffmpeg test ~173.9 fps
radxa@rock-5a:~/workspace/FFmpeg$ time mpp_cfg_debug=1 ./ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt nv12 -i /tmp/2000frames_nv12.yuv -c:v h264_rkmpp_encoder -profile 4 -f null /dev/null
...
frame= 2000 fps=175 q=-0.0 Lsize=N/A time=00:01:19.96 bitrate=N/A speed=   7x    0x
video:58770kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

real    0m11.499s
user    0m2.888s
sys     0m4.385s

In terms of encoding, ffmpeg is already faster than mpi_enc_test but still inferior to mpi_enc_mt_test.

Does the mt in mpi_enc mean multi-thread? It seems that we are still missing this in ffmpeg.

hbiyik commented 11 months ago

so yeah you are right, the encoder in ffmpeg is single threaded, but i do not understand why this matters actually.

because when you interface with mpp, i would expect mpp to internally spawn n threads as much as the encoder/decoder core, and manage them so we dont have to. thats what async means to me. However the mt is multithreaded and obviously running the encoder put/pull with multiple threads increases the performance.

I just added AV_CODEC_CAP_FRAME_THREADS to encoder capabilities, and wow, the result is blazigly fast, and completely broken video output :). But the speed outcome indicates that when it is used correctly, multi threading the encoder has potential about %400 ~ %500 imporvement, may be even more on hevc...

Yet it would probably introduce more latency, not sure...