intel / intel-vaapi-driver

VA-API user mode driver for Intel GEN Graphics family
https://01.org/linuxmedia
Other
308 stars 125 forks source link

ffmpeg 4.1 + VBR mode not working on Intel Broxton/CherryView #480

Open tmm1 opened 5 years ago

tmm1 commented 5 years ago

Using ffmpeg 4.2 on J3355 or J3455 chips in VBR mode ignores bitrate provided and always encodes to ~300kbps

I see media-driver has some special cases for BXT, so perhaps those are missing here?

https://github.com/intel/media-driver/blob/master/media_driver/linux/gen9_bxt/ddi/media_libva_caps_g9_bxt.cpp

tmm1 commented 5 years ago

Also seeing this on N3060 CherryView

The symptoms are similar to #430 but for AVC and only on some chipsets.

I tried the iHD Driver with ffmpeg 4.2 and it produces the correct bitrate in VBR mode.

tmm1 commented 5 years ago

Also seeing same behavior on N3710 CherryView

I tried J1900 Bay Trail and i5-7260U Kaby Lake and those work as expected.

So to summarize:

Broxton / CherryView

ffmpeg 4.0 + i965 2.4.0.pre1 = works ffmpeg 4.1 + i965 2.4.0.pre1 = broken ffmpeg 4.2 + i965 2.4.0.pre1 = broken ffmpeg 4.2 + iHD 19.2.1 = works

Bay Trail / Kaby Lake / others

ffmpeg 4.x + i965 2.4.0.pre1 = works

tmm1 commented 5 years ago

I was able to bisect this issue to the following ffmpeg commit: https://github.com/FFmpeg/FFmpeg/commit/2562dd9e7831743ba6dc5680501fb7d26a2ec62c

tmm1 commented 5 years ago

It appears that sending VAEncMiscParameterRateControl and other global params along with every IDR picture is causing the i965 driver to act pathologically on the affected chipsets.

Reverting https://github.com/FFmpeg/FFmpeg/commit/2562dd9e7831743ba6dc5680501fb7d26a2ec62c as follows to send the global params only once fixes the issue at that point in the commit history:

diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c
index dd2a24de04..66633a43b5 100644
--- a/libavcodec/vaapi_encode.c
+++ b/libavcodec/vaapi_encode.c
@@ -233,7 +233,7 @@ static int vaapi_encode_issue(AVCodecContext *avctx,
             goto fail;
     }

-    if (pic->type == PICTURE_TYPE_IDR) {
+    if (pic->encode_order == 0) {
         for (i = 0; i < ctx->nb_global_params; i++) {
             err = vaapi_encode_make_misc_param_buffer(avctx, pic,
                                                       ctx->global_params_type[i],
tmm1 commented 5 years ago

Unfortunately even with the patch above, the next commit (ffmpeg/ffmpeg@af532c921575eb8ee805cc2c64a914f6302442e1) reintroduces the same issue. Starting at that commit, the following revert is necessarily on top of the one above:

diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
index c63766d..d3d53bc 100644
--- a/libavcodec/vaapi_encode_h264.c
+++ b/libavcodec/vaapi_encode_h264.c
@@ -425,9 +425,9 @@ static int vaapi_encode_h264_init_sequence_params(AVCodecContext *avctx)
         // Try to scale these to a sensible range so that the
         // golomb encode of the value is not overlong.
         hrd->bit_rate_scale =
-            av_clip_uintp2(av_log2(ctx->va_bit_rate) - 15 - 6, 4);
+            av_clip_uintp2(av_log2(avctx->bit_rate) - 15 - 6, 4);
         hrd->bit_rate_value_minus1[0] =
-            (ctx->va_bit_rate >> hrd->bit_rate_scale + 6) - 1;
+            (avctx->bit_rate >> hrd->bit_rate_scale + 6) - 1;

         hrd->cpb_size_scale =
             av_clip_uintp2(av_log2(ctx->hrd_params.hrd.buffer_size) - 15 - 4, 4);
@@ -497,7 +497,7 @@ static int vaapi_encode_h264_init_sequence_params(AVCodecContext *avctx)
         .intra_idr_period = avctx->gop_size,
         .ip_period        = ctx->b_per_p + 1,

-        .bits_per_second       = ctx->va_bit_rate,
+        .bits_per_second       = avctx->bit_rate,
         .max_num_ref_frames    = sps->max_num_ref_frames,
         .picture_width_in_mbs  = sps->pic_width_in_mbs_minus1 + 1,
         .picture_height_in_mbs = sps->pic_height_in_map_units_minus1 + 1,
tmm1 commented 5 years ago

Updated patch which fixes this regression for me fully on the latest ffmpeg 4.2 release:

diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c
index b0dd46558c..9bee66ea08 100644
--- a/libavcodec/vaapi_encode.c
+++ b/libavcodec/vaapi_encode.c
@@ -235,6 +235,10 @@ static int vaapi_encode_issue(AVCodecContext *avctx,

     if (pic->type == PICTURE_TYPE_IDR) {
         for (i = 0; i < ctx->nb_global_params; i++) {
+            if (pic->encode_order != 0 &&
+                ctx->global_params_type[i] == VAEncMiscParameterTypeRateControl) {
+                continue;
+            }
             err = vaapi_encode_make_misc_param_buffer(avctx, pic,
                                                       ctx->global_params_type[i],
                                                       ctx->global_params[i],
@@ -1586,6 +1590,9 @@ rc_mode_found:
     ctx->va_rc_mode  = rc_mode->va_mode;
     ctx->va_bit_rate = rc_bits_per_second;

+    if (ctx->va_rc_mode == VA_RC_VBR)
+        ctx->va_bit_rate = rc_bits_per_second * rc_target_percentage / 100;
+
     av_log(avctx, AV_LOG_VERBOSE, "RC mode: %s.\n", rc_mode->name);
     if (rc_attr.value == VA_ATTRIB_NOT_SUPPORTED) {
         // This driver does not want the RC mode attribute to be set.

The first chunk ensures VAEncMiscParameterTypeRateControl is only sent to the driver once. Receiving this parameter multiple times causes the driver not to function correctly in VBR mode on Broxton/CherryView.

The second chunk fixes a regression where the SPS sent via VAEncSequenceParameterBufferType was being populated with the VBR max rate instead of the target bitrate. This also causes Broxton/CherryView to produce very low bitrates in VBR mode.

tmm1 commented 5 years ago

Bay Trail / Kaby Lake / others

ffmpeg 4.x + i965 2.4.0.pre1 = works

I think I was mistaken in my testing earlier. This bug seems to affect my i5-7260U Kaby Lake as well. Maybe it is related to the version of the i915 kernel driver?

The patch above still works and fixes the issue across all the chips I have tested.

tmm1 commented 5 years ago

I have submitted some patches to the ffmpeg-devel ML to fix the regression on the ffmpeg side.

However it seems like there are some driver bugs involved as well, since sending SPS with maxrate or sending RateControl params multiple times should not be able to cause such broken behavior.

Since the iHD driver does not require these changes in ffmpeg, it further confirms there is a bug in this driver.

tmm1 commented 5 years ago

The driver bug is possibly in intel_encoder_check_rate_control_parameter(), which appears to set hl_bitrate_updated any time it receives a VAEncMiscParameterTypeRateControl:

https://github.com/intel/intel-vaapi-driver/blob/021bcb79d1bd873bbd9fbca55f40320344bab866/src/i965_encoder.c#L608-L609

fulinjie commented 5 years ago

Hi @tmm1,

Tested ffmpeg h264_vaapi with i965 driver on J3455, but didn't observed the low bitrate issue. Would you please provided more informations on how to reproduce?

frame= 1000 fps= 50 q=-0.0 Lsize= 37443kB time=00:00:39.96 bitrate=7675.9kbits/s speed=2.01x video:37443kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%

cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 92 model name : Intel(R) Celeron(R) CPU J3455 @ 1.50GHz

ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -v verbose -f rawvideo -s:v 1920x1080 -pix_fmt nv12 -i ./bbb_1080p_nv12.yuv -vf format=nv12,hwupload -c:v h264_vaapi -b:v 8M -y ./h264e_vbr_1920x1080_bitrate8M.h264 ffmpeg version N-93264-g85051fe Copyright (c) 2000-2019 the FFmpeg developers built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609 configuration: --cc='ccache gcc -m64' --enable-libmfx --disable-optimizations --enable-debug=3 --disable-stripping --enable-gpl --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-shared libavutil 56. 26.100 / 56. 26.100 libavcodec 58. 47.102 / 58. 47.102 libavformat 58. 26.101 / 58. 26.101 libavdevice 58. 6.101 / 58. 6.101 libavfilter 7. 48.100 / 7. 48.100 libswscale 5. 4.100 / 5. 4.100 libswresample 3. 4.100 / 3. 4.100 libpostproc 55. 4.100 / 55. 4.100 [AVHWDeviceContext @ 0xbd4580] Opened VA display via DRM device /dev/dri/renderD128. [AVHWDeviceContext @ 0xbd4580] libva: VA-API version 1.4.0 [AVHWDeviceContext @ 0xbd4580] libva: va_getDriverName() returns 0 [AVHWDeviceContext @ 0xbd4580] libva: User requested driver 'i965' [AVHWDeviceContext @ 0xbd4580] libva: Trying to open /usr/local/lib/dri//i965_drv_video.so [AVHWDeviceContext @ 0xbd4580] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri//i965_drv_video.so [AVHWDeviceContext @ 0xbd4580] libva: Found init function __vaDriverInit_1_4 [AVHWDeviceContext @ 0xbd4580] libva: va_openDriver() returns 0 [AVHWDeviceContext @ 0xbd4580] Initialised VAAPI connection: version 1.4 [AVHWDeviceContext @ 0xbd4580] VAAPI driver: Intel i965 driver for Intel(R) Broxton - 2.4.0.pre1 (2.3.0-18-geefe4be). [AVHWDeviceContext @ 0xbd4580] Driver not found in known nonstandard list, using standard behaviour. [rawvideo @ 0xbe4000] Estimating duration from bitrate, this may be inaccurate Input #0, rawvideo, from './bbb_1080p_nv12.yuv': Duration: 00:00:40.00, start: 0.000000, bitrate: 622080 kb/s Stream #0:0: Video: rawvideo, 1 reference frame (NV12 / 0x3231564E), nv12, 1920x1080, 622080 kb/s, 25 tbr, 25 tbn, 25 tbc Stream mapping: Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (h264_vaapi)) Press [q] to stop, [?] for help [graph 0 input from stream 0:0 @ 0xbfe980] w:1920 h:1080 pixfmt:nv12 tb:1/25 fr:25/1 sar:0/1 sws_param:flags=2 [h264_vaapi @ 0xbefc80] Input surface format is nv12. [h264_vaapi @ 0xbefc80] Using VAAPI profile VAProfileH264High (7). [h264_vaapi @ 0xbefc80] Using VAAPI entrypoint VAEntrypointEncSlice (6). [h264_vaapi @ 0xbefc80] Using VAAPI render target format YUV420 (0x1). [h264_vaapi @ 0xbefc80] RC mode: VBR. [h264_vaapi @ 0xbefc80] RC target: 50% of 16000000 bps over 500 ms. [h264_vaapi @ 0xbefc80] RC buffer: 8000000 bits, initial fullness 6000000 bits. [h264_vaapi @ 0xbefc80] RC framerate: 25/1 (25.00 fps). [h264_vaapi @ 0xbefc80] Using intra, P- and B-frames (supported references: 4 / 1). [h264_vaapi @ 0xbefc80] All wanted packed headers available (wanted 0xd, found 0x1f). [h264_vaapi @ 0xbefc80] Using level 4. Output #0, h264, to './h264e_vbr_1920x1080_bitrate8M.h264': Metadata: encoder : Lavf58.26.101 Stream #0:0: Video: h264 (h264_vaapi) (High), 1 reference frame, vaapi_vld, 1920x1080, q=-1--1, 8000 kb/s, 25 fps, 25 tbn, 25 tbc Metadata: encoder : Lavc58.47.102 h264_vaapi No more output streams to write to, finishing.e=00:00:39.36 bitrate=7672.5kbits/s speed=2.02x
frame= 1000 fps= 50 q=-0.0 Lsize= 37443kB time=00:00:39.96 bitrate=7675.9kbits/s speed=2.01x
video:37443kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000% Input file #0 (./bbb_1080p_nv12.yuv): Input stream #0:0 (video): 1000 packets read (3110400000 bytes); 1000 frames decoded; Total: 1000 packets (3110400000 bytes) demuxed Output file #0 (./h264e_vbr_1920x1080_bitrate8M.h264): Output stream #0:0 (video): 1000 frames encoded; 1000 packets muxed (38341182 bytes); Total: 1000 packets (38341182 bytes) muxed [AVIOContext @ 0xbfac80] Statistics: 0 seeks, 147 writeouts [AVIOContext @ 0xbe4840] Statistics: 3110400000 bytes read, 0 seeks

tmm1 commented 5 years ago

Tested ffmpeg h264_vaapi with i965 driver on J3455, but didn't observed the low bitrate issue. Would you please provided more informations on how to reproduce?

Interesting you were not able to reproduce. Here is my test harness:

stream.mpg: http://0x0.st/zVBL.mpg

$ ffmpeg-4.2 -hide_banner -loglevel verbose -hwaccel vaapi -hwaccel_output_format vaapi -i stream.mpg -map 0:v -an -sn -c:v h264_vaapi -init_hw_device vaapi=intel:/dev/dri/renderD128 -filter_hw_device intel -b:v 3000k -maxrate:v 3200k -bufsize 6000k -f mpegts -y /dev/null
[AVHWDeviceContext @ 0x2584000] libva: VA-API version 1.5.0
[AVHWDeviceContext @ 0x2584000] libva: va_getDriverName() returns 0
[AVHWDeviceContext @ 0x2584000] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x2584000] Initialised VAAPI connection: version 1.5
[AVHWDeviceContext @ 0x2584000] VAAPI driver: Intel i965 driver for Intel(R) Broxton - 2.4.0.pre1 (2.4.0.pre1).
[AVHWDeviceContext @ 0x2584000] Driver not found in known nonstandard list, using standard behaviour.
Input #0, mpegts, from 'stream.mpg':
  Duration: 00:00:30.26, start: 86223.488411, bitrate: 13248 kb/s
  Program 1611
    Stream #0:0[0x17e4]: Video: mpeg2video (Main), 1 reference frame ([2][0][0][0] / 0x0002), yuv420p(tv, top first, left), 1920x1080 [SAR 1:1 DAR 16:9], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
    Stream #0:1[0x17e5](eng): Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, 5.1(side), fltp, 384 kb/s
    Stream #0:2[0x17e6](spa): Audio: ac3 (AC-3 / 0x332D4341), 48000 Hz, stereo, fltp, 192 kb/s (visual impaired)
Stream mapping:
  Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_vaapi))
Press [q] to stop, [?] for help
[mpegts @ 0x2592540] Correcting start time by 298911
[graph 0 input from stream 0:0 @ 0x2734c00] w:1920 h:1080 pixfmt:vaapi_vld tb:1/90000 fr:30000/1001 sar:1/1 sws_param:flags=2
[h264_vaapi @ 0x25bbf00] Input surface format is nv12.
[h264_vaapi @ 0x25bbf00] Using VAAPI profile VAProfileH264High (7).
[h264_vaapi @ 0x25bbf00] Using VAAPI entrypoint VAEntrypointEncSlice (6).
[h264_vaapi @ 0x25bbf00] Using VAAPI render target format YUV420 (0x1).
[h264_vaapi @ 0x25bbf00] RC mode: VBR.
[h264_vaapi @ 0x25bbf00] RC target: 93% of 3200000 bps over 1875 ms.
[h264_vaapi @ 0x25bbf00] RC buffer: 6000000 bits, initial fullness 4500000 bits.
[h264_vaapi @ 0x25bbf00] RC framerate: 30000/1001 (29.97 fps).
[h264_vaapi @ 0x25bbf00] Using intra, P- and B-frames (supported references: 4 / 1).
[h264_vaapi @ 0x25bbf00] All wanted packed headers available (wanted 0xd, found 0x1f).
[h264_vaapi @ 0x25bbf00] Using level 4.
[mpegts @ 0x25ba8c0] muxrate VBR, pcr every 2 pkts, sdt every 200, pat/pmt every 40 pkts
Output #0, mpegts, to '/dev/null':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Video: h264 (h264_vaapi) (High), 1 reference frame, vaapi_vld(left), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 3000 kb/s, 29.97 fps, 90k tbn, 29.97 tbc
    Metadata:
      encoder         : Lavc58.54.100 h264_vaapi
frame=  614 fps= 70 q=-0.0 size=     512kB time=00:00:20.75 bitrate= 202.1kbits/s speed=2.38x
fulinjie commented 5 years ago

It could be reproduced when mpeg2_vaapi decode was involved in the transcode pipeline.

tmm1 commented 5 years ago

@fulinjie Thanks for confirming and narrowing down the reproduction.

If you have any hunches as to where the bug is, I can try to make a patch to fix the driver.

tmm1 commented 5 years ago

Ping. Do you have any ideas about this bug?

rcombs commented 5 years ago

This seems to be related to the gen9_avc_vme_gpe_kernel_run routine. It goes away for me if I call gen9_avc_kernel_brc_init_reset again after gen9_avc_kernel_brc_frame_update on the first frame, but I don't understand the inner workings of those functions so I can't really guess why that would be. See https://github.com/intel/intel-vaapi-driver/blob/021bcb79d1bd873bbd9fbca55f40320344bab866/src/i965_avc_encoder.c#L8498

chrisallen commented 4 years ago

Hey guys, Just wondering if there is any update on this issue? @tmm1 I noticed in a previous comment you submitted a patch to the ffmpeg ML, was this successful?

tmm1 commented 4 years ago

The patch to ffmpeg was rejected as a driver bug. (Since the iHD driver works fine, ffmpeg's stance is that the code in ffmpeg does not need to change and the bug should be fixed in this driver instead.)

chrisallen commented 4 years ago

Is there someone that you know of @tmm1 that could take on the driver bug investigation and fix? Is there someone at Intel we could reach out too who has worked on a similar portion of the codebase?

rcombs commented 4 years ago

After some experimentation I landed on this patch, which seems to solve the problem for my use-case… I wouldn't describe it as a "correct" solution, though:

diff --git a/src/i965_avc_encoder.c b/src/i965_avc_encoder.c
index ab73384..0727f41 100644
--- a/src/i965_avc_encoder.c
+++ b/src/i965_avc_encoder.c
@@ -8626,6 +8626,11 @@ gen9_avc_vme_gpe_kernel_run(VADriverContextP ctx,
         }
         gen9_avc_kernel_brc_frame_update(ctx, encode_state, encoder_context);

+        if (!encode_state->reset_hack_done) {
+            gen9_avc_kernel_brc_init_reset(ctx, encode_state, encoder_context);
+            encode_state->reset_hack_done = 1;
+        }
+
         if (avc_state->brc_split_enable && generic_state->mb_brc_enabled) {
             gen9_avc_kernel_brc_mb_update(ctx, encode_state, encoder_context);
         }
diff --git a/src/i965_drv_video.h b/src/i965_drv_video.h
index b4326e5..f8d7ad0 100644
--- a/src/i965_drv_video.h
+++ b/src/i965_drv_video.h
@@ -301,6 +301,8 @@ struct encode_state {
     struct object_surface *reconstructed_object;
     struct object_buffer *coded_buf_object;
     struct object_surface *reference_objects[16]; /* Up to 2 reference surfaces are valid for MPEG-2,*/
+
+    int reset_hack_done;
 };

 struct proc_state {