valbok / QtAVPlayer

Free and open-source Qt Media Player library based on FFmpeg, for Linux, Windows, macOS, iOS and Android.
MIT License
310 stars 61 forks source link

How to render frames from h264_mediacodec directly, without conversion, accelerated? #322

Open geminixdev opened 1 year ago

geminixdev commented 1 year ago

In #273 you explain the process between decoding and rendering clearly:

Let me explain how "everything" works:

ffmpeg decodes a source using hardware acceleration and puts data to D3D11 or Metal or VDPAU or VAAPI or whatever textures. Pointer to texture is returned in av_frame->data[x] To get access to the data for the frame in this case you should use av_hwframe_transfer_data: https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavvideobuffer_gpu.cpp#L22 - it is done when you call QAVVideoFrame::map() --- it downloads data from gpu to cpu! What should you do when you would like to render it without mapping in RHI?

F.e. using Metal textures: we convert QAVVideoFrame to QVideoFrame and set QVideoFrame::RhiTextureHandle - > defines that the frame contains raw texture handle. RHI knows that current platform is macOS and handleType() points to RhiTextureHandle so no need to try to reuse this texture handle instead of calling QVideoFrame::map().

For Windows, D3D11 render is used by default. So if you use QVideoFrame::RhiTextureHandle , Qt is expecting to have pointer to ID3D11Texture2D. 2.1 So if ffmpeg creates this texture for us, we can try to return it - > it does not work since pixel format is NV12 which expects to have 2 textures - color and data. 2.2 Creating 2 instances of ID3D11Texture2D textures and copying data to it from original -> did not work for me yet -> Qt should reuse these textures and avoid mapping but crashes. Need to dive deeper, maybe doing something wrong.

How about Android with h264_mediacodec?

The frames delivered by avcodec_receive_frame seem to be NV12 and seem not to get converted or mapped, when sending them to the videosink they are still NV12 (at least I don't see where mapping or conversion would happen).

In armeabi-v7a devices the h264_mediacodec decoding is too slow, already not smooth anymore with HD. Top shows 100% CPU load, for mediacodec??? (On arm64-v8a only around 20% for mediacodec.)

So for arm64-v8a devices the decoding and the process after avcodec_receive_frame seems to be very fast, all is playing smoothly, even on older devices.

However on armeabi-v7a devices, usually Android boxes, despite having fast CPUs, the transfer of data in and out of mediacodec plus the decoding is ridiculously slow, and the CPU usage (SD 40%, HD 100%) far too high for hardware decoding.

What could be wrong there, or different there?

geminixdev commented 1 year ago

Just to make that clear: the slowness is not visible when testing with arm64-v8a devices. Unfortunately all Android Boxes are armeabi-v7a, and there it makes them unusable for anything better than SD.

valbok commented 1 year ago

You said that QtMultimedia for 5.15 does not have perf issues? there are 2 ways to render from QtMM: https://github.com/qt/qtmultimedia/blob/5.15/src/qtmultimediaquicktools/qdeclarativevideooutput_render.cpp#L344

Uploads data to opengl textures, (and as I remember it is used by default for Android). And should be used with QtAVPlayer and its QVideoFrame

but also it can use window based renderer https://github.com/qt/qtmultimedia/blob/5.15/src/qtmultimediaquicktools/qdeclarativevideooutput_window.cpp#L88 which should be totally copy-free, and rendering is done directly to a window without any videoframes.

Also when you convert QAVVideoFrame to QVideoFrame there is hardcoded converting https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavvideoframe.cpp#L314

which downloads data from GPU, since there is no support for mediacodec in QRHI (as I remember)

valbok commented 1 year ago

Can you confirm that qml rendering is lagging? Using VideoOutput? Can you also disable sending video frames to VideoOutput, but received from the player and confirm that CPU is low and GUI is acting fast?

  1. Checking if decoding does not consume CPU and does not impact to GUI
  2. Checking if rendering is not efficient enough, it can be even checked without a player, just send many QVideoFrames to VideoOutput ?
geminixdev commented 1 year ago

Also when you convert QAVVideoFrame to QVideoFrame there is hardcoded converting https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavvideoframe.cpp#L314

which downloads data from GPU, since there is no support for mediacodec in QRHI (as I remember)

Line 314:

            result = convertTo(AV_PIX_FMT_YUV420P);

But in the context, in the lines before that, there is the check for AV_PIX_FMT_NV12, see below:

        case AV_PIX_FMT_NV12:
            format = VideoFrame::Format_NV12;
            break;
        default:
            // TODO: Add more supported formats instead of converting
            result = convertTo(AV_PIX_FMT_YUV420P);
            format = VideoFrame::Format_YUV420P;
            break;

and in my checks (at least with the development devices in arm64-v8a) the frames delivered from avcodec_receive_frame are AV_PIX_FMT_NV12, and still are AV_PIX_FMT_NV12 when going into that QAVVideoFrame to QVideoFrame conversion and also out of that. That makes me thinking that there is no converion happening, thus no slowdown caused by that conversion.

I will rerun timing and pixel format tests with an armeabi-v7a device, to make sure that it is the same there.

geminixdev commented 1 year ago

Can you confirm that qml rendering is lagging? Using VideoOutput? Can you also disable sending video frames to VideoOutput, but received from the player and confirm that CPU is low and GUI is acting fast?

  1. Checking if decoding does not consume CPU and does not impact to GUI
  2. Checking if rendering is not efficient enough, it can be even checked without a player, just send many QVideoFrames to VideoOutput ?

One of my tests on an armeabi-v7a device was to comment out the videoSink->setVideoFrame() command, and that alone had absolutely no influence on the decoding time and mediacodec CPU, did not reduce them.

I will also rerun that, and may be drop the frames right after avcodec_receive_frame.

geminixdev commented 1 year ago

You said that QtMultimedia for 5.15 does not have perf issues?

yes, definitely, Full HD no problem at all, smooth and GUI not lagging, on the same armeabi-v7a devices.

Especially that is irritating, as it contradicts the slow mediacodec decoding and the 100% CPU of it which I see in my tests on exactly the same devices.

geminixdev commented 1 year ago

Also when you convert QAVVideoFrame to QVideoFrame there is hardcoded converting https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavvideoframe.cpp#L314 which downloads data from GPU, since there is no support for mediacodec in QRHI (as I remember)

Line 314:

            result = convertTo(AV_PIX_FMT_YUV420P);

But in the context, in the lines before that, there is the check for AV_PIX_FMT_NV12, see below:

        case AV_PIX_FMT_NV12:
            format = VideoFrame::Format_NV12;
            break;
        default:
            // TODO: Add more supported formats instead of converting
            result = convertTo(AV_PIX_FMT_YUV420P);
            format = VideoFrame::Format_YUV420P;
            break;

and in my checks (at least with the development devices in arm64-v8a) the frames delivered from avcodec_receive_frame are AV_PIX_FMT_NV12, and still are AV_PIX_FMT_NV12 when going into that QAVVideoFrame to QVideoFrame conversion and also out of that. That makes me thinking that there is no converion happening, thus no slowdown caused by that conversion.

I will rerun timing and pixel format tests with an armeabi-v7a device, to make sure that it is the same there.

A first test result, it confirms also on an armeabi-v7a device:

Tests are ongoing.

geminixdev commented 1 year ago

Can you confirm that qml rendering is lagging? Using VideoOutput? Can you also disable sending video frames to VideoOutput, but received from the player and confirm that CPU is low and GUI is acting fast?

  1. Checking if decoding does not consume CPU and does not impact to GUI
  2. Checking if rendering is not efficient enough, it can be even checked without a player, just send many QVideoFrames to VideoOutput ?

Also here a first test result:

Not executing videoSink->setVideoFrame() apparently does not change anything. Still the same high CPU for Mediacodec (99 - 100% for HD and Full HD) , still the the same overload of decoding with FullHD. (The unsmoothness of HD cannot get checked without actually seeing the pictures)

Tests are ongoing.

geminixdev commented 1 year ago

In QAVFrameCodec::decode() there is avcodec_receive_frame(). Logging the time it runs gives these results:

HD:

05-03 00:13:59.037  8197  8399 I Player  : [03 0:13:59.036 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1280 x 720 frame->format: 23 frame->pts: 4338057600
05-03 00:13:59.073  8197  8399 I Player  : [03 0:13:59.073 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1280 x 720 frame->format: 23 frame->pts: 4338061200
05-03 00:13:59.111  8197  8399 I Player  : [03 0:13:59.111 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1280 x 720 frame->format: 23 frame->pts: 4338064800

This shows that the second avcodec_receive_frame() took 37 msec, and the third 38 ms. These times are consistent, always more or less the same. Considering 40 ms available for25 fps we are at the limit here.

Full HD:

05-03 01:02:10.570  8197  8397 I Player  : [03 1:02:10.570 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594467600
05-03 01:02:10.651  8197  8397 I Player  : [03 1:02:10.651 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594471200
05-03 01:02:10.733  8197  8397 I Player  : [03 1:02:10.733 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594474800
05-03 01:02:10.892  8197  8397 I Player  : [03 1:02:10.892 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594478400
05-03 01:02:10.899  8197  8397 I Player  : [03 1:02:10.898 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594482000
05-03 01:02:11.055  8197  8397 I Player  : [03 1:02:11.055 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594485600
05-03 01:02:11.068  8197  8397 I Player  : [03 1:02:11.067 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594489200
05-03 01:02:11.221  8197  8397 I Player  : [03 1:02:11.221 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594492800
05-03 01:02:11.232  8197  8397 I Player  : [03 1:02:11.231 +07 I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 23 frame->pts: 4594496400

There is a big fluctuation, between 6 ms and 150 ms, with an average of about 80 ms. Far too long for the 40 ms we have at 25 fps.

So the question seems to be, why is it so slow? Possibly an option or something in the codec context which must be set or set differently?

valbok commented 1 year ago
  1. about NV12 , interesting, but AV_PIX_FMT_MEDIACODEC should be used, otherwise it looks like software decoding.
  2. wondering if QAVPlayer itself consumes too much CPU
  3. you say that pts diff between 2 frames are increasing?
geminixdev commented 1 year ago

The tests above where with an HK1 X4 Box. Now below tests with a Tanix TX6 Box. And it gets weirder, here

05-03 03:13:10.200  7868 22724 I Player  : [03 3:13:10.200 MYT I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 0 frame->pts: 7366644000
05-03 03:13:10.206  7868 22724 I Player  : [03 3:13:10.206 MYT I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 0 frame->pts: 7366647600
05-03 03:13:10.210  7868 22724 I Player  : [03 3:13:10.210 MYT I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 0 frame->pts: 7366651200
05-03 03:13:10.214  7868 22724 I Player  : [03 3:13:10.214 MYT I]   decode_mediacodec, after avcodec_receive_frame. frames.size() 1920 x 1080 frame->format: 0 frame->pts: 7366654800

Why is playback not smooth here for Full HD and HD (HD is almost OK)?
Possibly the high IOW of 23%

User 2%, System 1%, IOW 23%, IRQ 0%
User 287 + Nice 4 + Sys 166 + Idle 10063 + IOW 3327 + IRQ 0 + SIRQ 22 = 13869

  PID USER     PR  NI CPU% S  #THR     VSS     RSS PCY Name
 7868 u0_a89   20   0   1% S    49 2154028K 662952K unk my.player
 2006 mediacod 20   0   0% S    14 166156K  66288K unk media.codec
 1859 system   12  -8   0% S    22 119036K   9612K unk /system/bin/surfaceflinger
28418 u0_a12   20   0   0% S    28 1265700K  95768K unk com.google.android.gms
 1997 audioser 20   0   0% S     9  39648K   4816K unk /system/bin/audioserver
30208 root     20   0   0% R     1   4780K   1568K unk top

(This is still without executing videoSink->setVideoFrame() )

geminixdev commented 1 year ago
  • about NV12 , interesting, but AV_PIX_FMT_MEDIACODEC should be used, otherwise it looks like software decoding.

Yes, that is what I expected too. However we seem to get NV12 and YUV420P.

About possible software decoding, I specifically set 'h264_mediacodec' and logcat seems to confirm that its used.

  • wondering if QAVPlayer itself consumes too much CPU

I assume that QAVPlayer as library is shown within the Player CPU%. According to top then its not too much.

  • you say that pts diff between 2 frames are increasing?

Not the pts diff, that is always ok, 3600. With times I refer to the logged times at each log entry. [03 3:13:10.200 MYT I] and [03 3:13:10.206 MYT I] show 6 ms difference, thus that avcodec_receive_frame() needed 6 ms.

I will recheck with the first box, the HK1, if there is an error in logcat about h264_mediacodec.

geminixdev commented 1 year ago

I will recheck with the first box, the HK1, if there is an error in logcat about h264_mediacodec.

Logcat shown no indication that software decoding would be used. There are entries like that:

05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"codecInit done"
05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"mOutWidth is 1920 mOutHeight is 1080 mFlvFlag=0 mOutBufferCount is 10"
05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"mOutBufferCount =10 mDecOutWidth 1920 mDecOutHeight 1088\n"
05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"mIsNativeBuffers =0\n"
05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"setUp mOutPortChanged=0\n"
05-03 03:05:00.903   412  4104 D AmlogicVideoDecoderAwesome2: [22]"use nv12\n"
05-03 03:05:00.903   412  4104 I OmxComponent: STATE_DONE:  OMX_StateLoaded => OMX_StateIdle : OMX.amlogic.avc.decoder.awesome2
05-03 03:05:00.904   412  4104 I OmxComponent: OMX_CommandStateSet 850 Cmd 0 nParam1 0x3
05-03 03:05:00.904   412  4104 I OmxComponent: OMX-31 STATE_SET:   OMX_StateIdle => OMX_StateExecuting : OMX.amlogic.avc.decoder.awesome2
05-03 03:05:00.904   412  4104 V AmlogicVideoDecoderAwesome2: [22]prepare:315 >
05-03 03:05:00.904   412  4104 I AmlogicVideoDecoderAwesome2: [22]"AllocDmaBuffers uvm mDecOutWidth:1920 mDecOutHeight:1088, 1920x1088"

which might explain why NV12 is used.

valbok commented 1 year ago

You should track Cpu of the application. There is simple way how to determine it is hw accelerated or not. Just compare Cpu usage of the app with mediacodec and without.

valbok commented 1 year ago

I assume that QAVPlayer as library is shown within the Player CPU%. According to top then its not too much.

Does it mean you see low CPU% but lags and delays in receiving,decoding frames?

geminixdev commented 1 year ago

I assume that QAVPlayer as library is shown within the Player CPU%. According to top then its not too much.

Does it mean you see low CPU% but lags and delays in receiving,decoding frames?

Yes, but in 2 very different scenarios for the 2 armeabi-v7a devices HK1 and TX6.

HK1 box playing

TX6 Box

Please note that avcodec_receive_frame() times are not decoding times. Due to the asynchronous process these are only the times avcodec_receive_frame() needs to return. Actual decoding might take longer.

  1. Thus for the HK1 the question is, why does it take so very long in avcodec_receive_frame() ?
  2. And for the TX6 Box, which gets YUV420P from avcodec_receive_frame(), the effect of rendering (executing videoSink->setVideoFrame()) needs to get checked.
geminixdev commented 1 year ago

You should track Cpu of the application. There is simple way how to determine it is hw accelerated or not. Just compare Cpu usage of the app with mediacodec and without.

Done, yes, it definitely was not software decoding. With software decoding activated the player CPU is higher, much higher, and mediacodec CPU is low, irrelevant.

Interestingly avcodec_receive_frame() returns immediately, 0 ms.

geminixdev commented 1 year ago

More testing with the HK1 Box resulted in the box playing with software decoding in Full HD better than with Mediacodec.

With mediacodec the decoding speed seems to be just half of what's needed, but with software decoding the decoding speed is just fast enough.

Display is smooth. despite the player CPU being much higher, about 135 %, there is no lagging, no microjumps, smooth.

That is most of the time, from time to time there seems to be some other activity on the box slowing it down, and then decoding gets too slow, some buffering, until the decoding did catch up, then smooth again for a while.

So on the HK1 box software decoding is much much much better than the stop and go of the decoding with mediacodec. Very clearly the problem is there with mediacodec.

On the other box, the TX6, software decoding is slower than Mediacodec. As expected. I don't see yet where exactly the problem is with that other box.

valbok commented 1 year ago
  • Top shows 49% for the player and < 20 % for mediacodec. With software decoding activated the player CPU is higher, much higher, and mediacodec CPU is low, irrelevant.

Sorry, not clear here, what is the player and mediacodec *-) Top should show CPU for entire process. And seems using h264_mediacodec decreases CPU but there are some lags with frames on FullHD? Even if there is no any rendering involved yet.

valbok commented 1 year ago

So on the HK1 box software decoding is much much much better than the stop and go of the decoding with mediacodec. Very clearly the problem is there with mediacodec.

Trying to find any configuration settings that might help here, maybe need to increase num of threads or framerate or ...

I am not able to test myself right now, but it would be interesting to try some flags https://ffmpeg.org/doxygen/4.1/structAVCodecContext.html

Force low delay 887 #define AV_CODEC_FLAG_LOW_DELAY (1 << 19)

Here https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavcodec.cpp#L77 d->avctx->flags = AV_CODEC_FLAG_LOW_DELAY | AV_CODEC_FLAG2_FAST

also https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavcodec.cpp#L70 av_opt_set_int(d->avctx, "threads", 1???, 0);

geminixdev commented 1 year ago
  • Top shows 49% for the player and < 20 % for mediacodec. With software decoding activated the player CPU is higher, much higher, and mediacodec CPU is low, irrelevant.

Sorry, not clear here, what is the player and mediacodec *-) Top should show CPU for entire process. And seems using h264_mediacodec decreases CPU but there are some lags with frames on FullHD? Even if there is no any rendering involved yet.

The "player" is the Player code built with QtAVPlayer. The CPU% shown for that includes whatever Player code is used and the linked in QtAVPlayer, together.

"mediacodec" is the CPU the h264_mediacodec is using. Although it should be in hardware, there seems to be CPU involved, I assume for moving the data in and out., or the OS part of the API controlling mediacodec.

On the HK1 box using mediacodec for decoding, it looks like this, as shown in top in 'adb shell' (the 9th column is the CPU %, irrelevant rows removed):

HK1 SD 720x576:

   412 mediacodec   20   0 138M  45M  39M S 44.6   1.1 536:47.36 media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       10 -10 1.7G 374M 175M S 40.6   9.9   0:47.70 my.player

HK1 HD 1280x720:

   412 mediacodec   20   0 154M  52M  46M S 99.0   1.3 537:41.57 media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       10 -10 1.6G 335M 185M S 44.3   8.9   1:27.03 my.player

HK1 FHD 1920x1080:

   412 mediacodec   20   0 188M  69M  63M S 99.0   1.8 538:42.79 media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       10 -10 1.7G 393M 211M S 39.0  10.4   1:55.56 my.player  

or shortened, with irrelevant columns removed, keeping only the CPU%:

HK1 SD 720x576:

   412 mediacodec   ... 44.6   ... media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       ... 40.6   ... my.player

HK1 HD 1280x720:

   412 mediacodec   ... 99.0   ... media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       ... 44.3   ... my.player

HK1 FHD 1920x1080:

   412 mediacodec   ... 99.0   ... media.codec hw/android.hardware.media.omx@1.0-service
  2602 u0_a99       ... 39.0  ... my.player  
geminixdev commented 1 year ago

So on the HK1 box software decoding is much much much better than the stop and go of the decoding with mediacodec. Very clearly the problem is there with mediacodec.

Trying to find any configuration settings that might help here, maybe need to increase num of threads or framerate or ...

I am not able to test myself right now, but it would be interesting to try some flags https://ffmpeg.org/doxygen/4.1/structAVCodecContext.html

Force low delay 887 #define AV_CODEC_FLAG_LOW_DELAY (1 << 19)

Here https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavcodec.cpp#L77 d->avctx->flags = AV_CODEC_FLAG_LOW_DELAY | AV_CODEC_FLAG2_FAST

also https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavcodec.cpp#L70 av_opt_set_int(d->avctx, "threads", 1???, 0);

Yes, thanks, I will test this!

geminixdev commented 1 year ago
     av_opt_set_int(d->avctx, "threads", 2, 0); 

and

      d->avctx->flags = AV_CODEC_FLAG_LOW_DELAY | AV_CODEC_FLAG2_FAST;

did not produce a noticable change, unfortunately.

valbok commented 1 year ago

Interesting that it might mean that decoding itself is quite "slow", since there is no any rendering but it already consumes some time?

geminixdev commented 1 year ago

is explained here: https://speakerdeck.com/tmm1/video-decoding-with-ffmpeg-on-ios-and-android?slide=34 on slide 34 to 38, and here: http://mplayerhq.hu/pipermail/ffmpeg-devel/2016-March/191700.html which will result in a nice performance boost:

On a nexus 5, decoding an h264 stream (main profile) 1080p at 60fps:

  • software output + rgba conversion goes at 59~60fps
  • surface output + render on a surface goes at 100~110fps

And here is how Qt implements this in Qt 6: https://codereview.qt-project.org/c/qt/qtmultimedia/+/449591/2/src/plugins/multimedia/ffmpeg/qffmpeghwaccel_mediacodec.cpp

In summary, we have to tell Mediacodec to decode to a surface, and then render this surface directly.

So once again, very special process for Mediacodec decoding.

geminixdev commented 1 year ago

Interesting that it might mean that decoding itself is quite "slow", since there is no any rendering but it already consumes some time?

Based on my last post here my guess is that this box, the HK1 has a bug when Mediacodec outputs NV12 in higher resolutions. With HD it's still ok. There is another box, brand Ugoos, which seems to show the same behavior.

For the other box, the TX6, there it seems to be that the rendering of the YUV420P frames is in software, not very slow but at the limit for HD, and not fast enough to play anything better than HD smoothly.

I think both boxes might be very ok when that 'decoding to the surface which will get displayed directly' can get implemented. (This is based on seeing that they play full HD/1080p perfectly fine with QtMM.)

geminixdev commented 1 year ago
  • Why the frames are received in NV12 and YUV420P, and not in AV_PIX_FMT_MEDIACODEC, and
  • what to add to get the frames in AV_PIX_FMT_MEDIACODEC, or better said how to get an AV_PIX_FMT_MEDIACODEC surface, and
  • how to render this surface

is explained here: https://speakerdeck.com/tmm1/video-decoding-with-ffmpeg-on-ios-and-android?slide=34 on slide 34 to 38, and here: http://mplayerhq.hu/pipermail/ffmpeg-devel/2016-March/191700.html

... In summary, we have to tell Mediacodec to decode to a surface, and then render this surface directly.

Following the recipe posted above, I seem to have it working to get avcodec_receive_frame() to produce frames in pixelformat AV_PIX_FMT_MEDIACODEC, and to render it to a surface. Missing is still to embed the surface somewhere, so it is visible. Nevertheless, no errors.

valbok commented 1 year ago

Super, could you please share how you integrated this to QAVPlayer? It would be needed to be placed inside https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavhwdevice_mediacodec.cpp

geminixdev commented 1 year ago

Super, could you please share how you integrated this to QAVPlayer? It would be needed to be placed inside https://github.com/valbok/QtAVPlayer/blob/master/src/QtAVPlayer/qavhwdevice_mediacodec.cpp

Yes, I will. To be sure it realy works I need to complete the last step somehow, to make the surface visible, in QML videooutout or equivalent. How to display an Android/View/Surface in a QML item. I'm working on that now, as time allows. I'm not an expert on that though, that was always the part Qt took care of.

Once I can see it and confirm that it works, then there is the code cleanup. Currently it's quick and dirty.

Yes, I have seen qavhwdevice_mediacodec.cpp as the best place for such code.

geminixdev commented 1 year ago

Update: I have spent days to try to get the rendering part to work. All code examples are however about displaying pictures stored in regular RAM to screen, using OpenGL. As described above, we need to display the texture which is already a texture (Android SurfaceTexture), and which mediacodec kann show in a surface directly, zero copy. So we need to make that surface visible, by having a QML Item / QML Videoutput using the same Surface/ SurfaceTexture. Any route through videoSink->setVideoframe() seem to be useless for that, as they do not expect an Android SurfaceTexture, and it would try to duplicate the rendering which mediacodec does already.

The good news is that Qt has programmed that, it's now in QtMM, but I think not released before Qt 6.5.1. It might not be accessible through the public API though, but only using private APIs.

Qt 6.5.1 will be released in about a week, then I can continue testing if it is possible to hook in there.

Alternatively:

I would need this great KDAB example How to create a zero-copy Android SurfaceTexture QML updated for Qt6. Unfortunately Qt changed all QSGSimple.. classes due to building on RHI instead of OpenGL. As that is not my expertise, I cannot do that (yet) myself. Otherwise I think that code does the same, and might avoid to use QtMM code. I would prefer that.

valbok commented 1 year ago

thanks you keep us updated =)

valbok commented 1 year ago

I have spent days to try to get the rendering part to work. All code examples are however about displaying pictures stored in regular RAM to screen, using OpenGL. As described above, we need to display the texture which is already a texture (Android SurfaceTexture), and which mediacodec kann show in a surface directly, zero copy. So we need to make that surface visible, by having a QML Item / QML Videoutput using the same Surface/ SurfaceTexture. Any route through videoSink->setVideoframe() seem to be useless for that, as they do not expect an Android SurfaceTexture, and it would try to duplicate the rendering which mediacodec does already.

I started to look at this and found that QtMM creates SurfaceTexture and after it is attached to created gl texture using https://developer.android.com/reference/android/graphics/SurfaceTexture#attachToGLContext(int).

And GL_TEXTURE_EXTERNAL_OES is important here. Trying to implement this for Qt5 and no luck yet, since renderers use GL_TEXTURE_2D. Maybe we can avoid Qt5 at all.

valbok commented 1 year ago

https://github.com/valbok/QtAVPlayer/pull/363 Only for Qt6

valbok commented 1 year ago

Could you please confirm that it works? And will close it.

geminixdev commented 1 year ago

363 Only for Qt6

Great, thanks, I'm happy to see this! I will be back on Android and testing this in the next days!

geminixdev commented 1 year ago

Sorry for the long delay! I finally managed to test these changes. I see that a lot has changed, not only the Android specific code.

All still works very well on on the arm8 devices, nice and smooth.

And excellent news also for the 2 armeabi-v7a devices I'm testing with, the HK1 and TX6. Both behave now normal, in all resolutions, no CPU overload. A huge difference to before!

The stuttering as seen before, when playing 720p or 1080p, is gone, as well as the funny behavior of the HK1 when decoding 1080p (as described above), all behaves normal.

It is almost perfect now!

So what is still not perfect? Playback is not 100% smooth, and that is in all resolutions. 1080p is not worse that 720p. Both have a slight 'hanging' of the picture, which is mostly noticable at regular movements. At chaotic movements or with not much moving, for example some people standing and talking, it is almost unnoticable.

The tool "top" shows for both devices in all resolutions no CPU overload at all, there is no reason visible in top (showing CPU for app and for mediacodec) why this would happen.

The only hint I found was logcat showing logspam like the lines below, apparently happening at every frame sent to display::

On the HK1 Box:

[SurfaceTexture-0-3799-0] bindTextureImage: clearing GL error: 0x500

On the TX6 Box (Android 7):

W GLConsumer: [SurfaceTexture-0-4973-0] bindTextureImage: clearing GL error: 0x500

This sounds like a shader issue, but then I would expect it to show no picture at all.

Possibly also with Android Mediacodec the texture should better not be pushed to display with videoSink->setVideoFrame(), but as Qt seems to do it in QtMM with just triggering an update of the shown Android surface texture, where the mediacodec has already placed the image when decoding.

valbok commented 1 year ago

Great news, could you also try to receive frames using Qt::DirectConnection? Qt6 changed the limitation and all frames could be delivered to the video sink on different threads now and might be possible that events queue hangs a bit?

geminixdev commented 1 year ago

Great news, could you also try to receive frames using Qt::DirectConnection? Qt6 changed the limitation and all frames could be delivered to the video sink on different threads now and might be possible that events queue hangs a bit?

Qt::DirectConnection to receive the frames did not help. Also there are crashes now around QVideoFrame (backtrace in logcat), which I haven't noticed before.

I measured the times between the videoSink->setVideoFrame() calls and they are as expected (always around 40ms at 25 fps content) and don't explain any hanging.

However when I measure the time between the VideoBuffer_MediaCodec::handle() calls, there where the display frame magic happens with the Android surface texture, they are between 30 ms and more than 50 ms. Occasionally even < 20 ms, and often > 50ms, up to 54 ms. This seems to be reflecting exactly the unsmoothness.

I would have expected them to be regularly, around 40ms, the same as the videoSink->setVideoFrame() calls which are triggering them. I assume the videosink code and the VideoBuffer_MediaCodec::handle() runs in the same thread as QML.

geminixdev commented 1 year ago

However when I measure the time between the VideoBuffer_MediaCodec::handle() calls, there where the display frame magic happens with the Android surface texture, they are between 30 ms and more than 50 ms. Occasionally even < 20 ms, and often > 50ms, up to 54 ms. This seems to be reflecting exactly the unsmoothness.

To see if there is something somewhere slowing down the eventloop to cause these time differences as seen in the VideoBuffer_MediaCodec::handle() method, I let the thread sleep every time to delay the texture update to a minimum of 37 ms. So whenever the delay to the previous run of handle() was shorter than that, the function had to sleep to compensate. And the result was that now the playing was much smoother, even on the old armeabi-v7a devices.

My interpretation is that the variable delay seen in the handle() function is caused somewhere in the QtMM code handling the videoSink->setVideoFrame() calls, or in the Android code involved.

Nevertheless for my application, using this dirty workaround of sleeping the handle() function, this is working sufficiently well now!!!

There is still the “bindTextureImage: clearing GL error: 0x500” spamming the logs on almost all devices. It seems to have no other negative effect though. Unless you have an idea of how this can get fixed, avoided, this issue can get closed.

I realize that it might not be an option for general use to add such a delay control in the handle function. I also have only tested with Qt 6.4.3, not yet with Qt 6.5.2. Possibly QtMM behaves differently there.

geminixdev commented 1 year ago

After comparing extensively the rendering smoothness of Qt 5 QtMM with this solution, it was obvious that the rendering of the GL_TEXTURE_EXTERNAL_OES frames was showing the unreliable timing on all devices, and never seems to play really smooth. So I ended up with this:

armeabi-v7a devices:

arm64-v8a devices

I might retest with Qt 6.5.2 later, but due the Qt's switch to cmake, and the unavailability of the QtAVPlayer module in Qt 6.5.2 qmake, I have to learn first how to use cmake for Qt on Android.

valbok commented 1 year ago

Thanks, it would still need to dive into issues with GL_TEXTURE_EXTERNAL_OES, interesting what happens there.

I might retest with Qt 6.5.2 later, but due the Qt's switch to cmake, and the unavailability of the QtAVPlayer module in Qt 6.5.2 qmake, I have to learn first how to use cmake for Qt on Android.

Started to think that no need any libs here, https://github.com/valbok/QtAVPlayer/issues/374 and may be it is easier to always statically build to an app, still not clear how to deal with configure options in that way but would like to consider never build QtAVPlayer as separate and always should be part of an app.

geminixdev commented 1 year ago

Started to think that no need any libs here, #374 and may be it is easier to always statically build to an app, still not clear how to deal with configure options in that way but would like to consider never build QtAVPlayer as separate and always should be part of an app.

Which would allow to continue to use qmake in Qt 6.5, right? Yes I like to try that!

valbok commented 1 year ago

Started to think that no need any libs here, #374 and may be it is easier to always statically build to an app, still not clear how to deal with configure options in that way but would like to consider never build QtAVPlayer as separate and always should be part of an app.

Which would allow to continue to use qmake in Qt 6.5, right? Yes I like to try that!

https://github.com/valbok/QtAVPlayer/pull/389 cmake support is totally removed

valbok commented 1 year ago

for arm64-v8a devices I use QtAVPlayer with mediacodec decoding, but not decoding to GL_TEXTURE_EXTERNAL_OES (by not feeding the AndroidSurface to the Android Hardwarecontext).

Could you confirm that QtMM works there? It also uses GL_TEXTURE_EXTERNAL_OES and if it is smooth, there is a bug in QtAVPlayer and needs to be fixed.

, it was obvious that the rendering of the GL_TEXTURE_EXTERNAL_OES frames was showing the unreliable timing on all devices

Technically QtAVPlayer and QtMM should provide the same performance since they use the same impl. If there is a diff in perf, need to make sure that no bugs in QtAVPlayer, f.e. using DirectConnection for the frames is mandatory. etc