[Bug]: Frame resolution resize fails when using reference frames of different dimension(s)

StefanBossbaly commented 2 years ago

Which component impacted?

Decode

Is it regression? Good in old configuration?

No, this issue exist a long time

What happened?

The problem was observed on Fuchsia and then confirmed on Linux (via ffmpeg). When playing back the frm_resize WebM conformance streams. The stream plays back normally up until the first resolution resize. Once the resize happens the output becomes blocks of varying colors. Once a keyframe is encountered the stream recovers until another resize happens and then the output again becomes blocks of varying colors. Confirmed that frame resizing works when resizing on a keyframe. Since keyframes will clear our the reference frames I suspect that it is a problem with having reference frames of different dimensions.

To reproduce using ffmpeg using the following commands:

1) ffmpeg -hwaccel vaapi -init_hw_device vaapi=hw:/dev/dri/renderD128 -filter_hw_device hw -v verbose -c:v vp9 -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -pix_fmt yuv420p -f rawvideo -vsync passthrough -y crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv 2) mpv --demuxer=rawvideo --demuxer-rawvideo-w=1080 --demuxer-rawvideo-h=512 --demuxer-rawvideo-format=I420 crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv

To verify that the md5 hash does not match the WebM truth value you need to add -autoscale 0 since ffmpeg will scale the output to whatever the starting resolution was.

1) ffmpeg -hwaccel vaapi -init_hw_device vaapi=hw:/dev/dri/renderD128 -filter_hw_device hw -v verbose -c:v vp9 -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -pix_fmt yuv420p -autoscale 0 -f rawvideo -vsync passthrough -y crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv 2) md5sum crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv

Which yields the output 45b0fbf95bc023c849ecb9fd91367061, not the expected output of 51b3393fa98ad9ab99c0b45ef705ebc4. The outputted md5 hash seems to change between runs.

Using the software codec libvpx-vp9 gives the proper output. To verify run the following:

1) ffmpeg -v verbose -c:v libvpx-vp9 -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -pix_fmt yuv420p -f rawvideo -y crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv 2) mpv --demuxer=rawvideo --demuxer-rawvideo-w=1080 --demuxer-rawvideo-h=512 --demuxer-rawvideo-format=I420 crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv

To verify that the md5 hash does match the WebM truth value you add -autoscale 0:

1) ffmpeg -v verbose -c:v libvpx-vp9 -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -pix_fmt yuv420p -autoscale 0 -f rawvideo -y crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv 2) md5sum crowd_run_1080X512_fr30_bd8_frm_resize_l3.yuv

Which yields the expected output of 51b3393fa98ad9ab99c0b45ef705ebc4.

The VP9 standard does allow for different decoded frames to have different sizes, which some caveats, specially:

5.16 Reference frame scaling

It is legal for different decoded frames to have different frame sizes (and aspect ratios). VP9 automatically
handles resizing predictions from reference frames of different sizes.
However, reference frames must share the same color depth and subsampling format for reference frame
scaling to be allowed, and the amount of up/down scaling is limited to be no more than 16x larger and no less
than 2x smaller (e.g. the new frame must not be more than 16 times wider or higher than any of its used
reference frames).

What's the usage scenario when you are seeing the problem?

Playback

What impacted?

No response

Debug Information

1) What's libva/libva-utils/gmmlib/media-driver version? VA-API version 1.14.0 Intel iHD driver for Intel(R) Gen Graphics - 22.4.2

2) Output of ls /dev/dri

by-path  card0  renderD128

3) Output of vainfo >vainfo.log 2>&1: vainfo.log

4) Could you provide libva trace log? Run cmd export LIBVA_TRACE=/tmp/libva_trace.log first then execute the case.

Sorry about the messy logs. ffmpeg has a -threads parameter but it doesn't seem to limit the amount of threads when using -threads 1 for hwaccelerated playback. If you know of a way to get them all collapsed in the same log let me know. Also I had to append a .log to get Github to accept the upload.

libva_trace.log.164146.thd-0x00088bcf.log libva_trace.log.164146.thd-0x00088bd0.log libva_trace.log.164146.thd-0x00088bd1.log libva_trace.log.164146.thd-0x00088bd2.log libva_trace.log.164146.thd-0x00088bd3.log libva_trace.log.164146.thd-0x00088bd4.log libva_trace.log.164146.thd-0x00088bd5.log libva_trace.log.164146.thd-0x00088bd6.log libva_trace.log.164146.thd-0x00088bd7.log libva_trace.log.164146.thd-0x00088bd8.log

Do you want to contribute a patch to fix the issue?

No response

Jexu commented 2 years ago

As i know, ffmpeg will destroy umd device and reallocate new one when resizing happen. Do you check if ffmpeg save the reference list or clear the reference list when non-key frame resizing? I suspose the issue maybe caused by reference list losing. For non-key frame resizing, ffmpeg needs to save and restore reference list after reallocating new umd device.

It is possiable to reproduce with sample decoder?

StefanBossbaly commented 2 years ago

As i know, ffmpeg will destroy umd device and reallocate new one when resizing happen. Do you check if ffmpeg save the reference list or clear the reference list when non-key frame resizing? I suspose the issue maybe caused by reference list losing. For non-key frame resizing, ffmpeg needs to save and restore reference list after reallocating new umd device.

I'm not too familiar with how ffmpeg works under the hood but I know in the Fuchsia decoder we keep the reference frames from the other dimensions that were created under a different context. From my understanding, VASurfaces don't have to be bound to a VAContext. So having multiple VASurfaces of different resolutions existing in the VADecPictureParameterBufferVP9::reference_frames array is a valid use of the API. It's also my understanding that those surfaces can exist prior to the creation of the current context, the only condition is that the surfaces be destroyed after the context. When the Fuchsia decoder creates a new context via vaCreateContext with the resolution change and then goes to render a picture with vaBeginPicture, vaRenderPicture and vaEndPicture the call to vaSyncSurface on those operations returns VA_STATUS_ERROR_DECODING_ERROR. Calling vaQuerySurfaceError returns the following information ...

surface = 0x0000000c
error_status = 0x00000017
    status = 2
    start_mb = 0
    end_mb = 0

Surface 0xc was created with the new resolution change so it should be proper size to hold that image. I have attached the libva trace log from the Fuchsia device in case you want to verify.

libva_trace.log.170210.thd-0x00014693.log

It is possiable to reproduce with sample decoder?

Where is the sample decoder? I can give it a try.

Jexu commented 2 years ago

It is possiable to reproduce with sample decoder?

Where is the sample decoder? I can give it a try.

Get it from https://github.com/Intel-Media-SDK/MediaSDK

feiwan1 commented 2 years ago

@StefanBossbaly you can try https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=7245. This patchset avoid to re-create vaContext when resolution change. If re-create vaContext, media-driver will clean up all decode data which may be used in decoding next frames.

StefanBossbaly commented 2 years ago

@Jexu Ok I will give it a shot and see what happens.

@feiwan1 After a couple of test streams I can no longer reproduce the original issue and it seems like that fix worked. I will do more robust testing tomorrow to verify. It seems like media-driver should not clean up the decoded data until the surfaces are destroyed, which should occur after the context(s) that are using the surface are destroyed.

From the libva docs:

Contexts and Surfaces

Context represents a "virtual" video decode, encode or video processing pipeline. Surfaces are render
targets for a given context. The data in the surfaces are not accessible to the client except if derived 
image is supported and the internal data format of the surface is implementation specific.

Surfaces are provided as a hint of what surfaces will be used when the context is created through
vaCreateContext(). A surface may be used by different contexts at the same time as soon as
application can make sure the operations are synchronized between different contexts, e.g. a
surface is used as the output of a decode context and the input of a video process context.
Surfaces can only be destroyed after all contexts using these surfaces have been destroyed.

Both contexts and surfaces are identified by unique IDs and its implementation specific
internals are kept opaque to the clients

Is this something that can be fixed in media-driver or will I have to have a workaround to prevent this issue from happening in the Fuchsia decoder? Feel free to correct me if anything I said is wrong. Thanks again for the quick response!

stellawuintel commented 2 years ago

Auto Created VSMGWL-55860 for further analysis.

StefanBossbaly commented 2 years ago

@Jexu Tried it out with the sample decoder and got the correct output.

I had to convert the file from WebM to the IVF container format since that is the format that the sample decoder accepts for VP9.

ffmpeg -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -vcodec copy -an -f ivf crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm.ivf

Then I ran the sample decoder making sure to use VA-API surfaces with the hardware.

$ ./sample_decode vp9 -d -hw -p vp9d_hw -device /dev/dri/renderD128 -vaapi -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm.ivf -o ouput.yuv -i420
libva info: VA-API version 1.15.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_14
libva info: va_openDriver() returns 0
plugin_loader.h :185 [INFO] Plugin was loaded from GUID: { 0xa9, 0x22, 0x39, 0x4d, 0x8d, 0x87, 0x45, 0x2f, 0x87, 0x8c, 0x51, 0xf2, 0xfc, 0x9b, 0x41, 0x31 } (Intel (R) Media SDK HW plugin for VP9 DECODE)
pretending that stream is 30fps one
Decoding Sample Version 8.4.27.0

Input video VP9 
Output format   I420(YUV)
Input:
  Resolution    1088x512
  Crop X,Y,W,H  0,0,1080,512
Output:
  Resolution    1080x512
Frame rate  30.00
Memory type     vaapi
MediaSDK impl       hw
MediaSDK version    1.35

Decoding started
Frame number:  302, fps: 269.600, fread_fps: 0.000, fwrite_fps: 283.019
Decoding finished
plugin_loader.h :211 [INFO] MFXBaseUSER_UnLoad(session=0x0x55f2341c6690), sts=0

And then verified the md5 hash of the YUV file.

md5sum ouput.yuv

Which yields the expected value of 51b3393fa98ad9ab99c0b45ef705ebc4

libva_trace.log.134106.thd-0x0000650a.log libva_trace.log.134106.thd-0x00006507.log libva_trace.log.134106.thd-0x00006508.log libva_trace.log.134106.thd-0x00006509.log

Can confirm that the sample decoder never destroys any of the surfaces or the context until the end of the stream. The sample decoder set the current frame_width and frame_height on the VAPictureParameterBufferVP9 structure. So this looks like a bug related to the destruction of the context with existing surfaces during the middle of a stream.

These conformance streams always start with the larger resolution which means that we can always use the existing surfaces since they will be large enough to hold the smaller picture. I modified one of the streams to start with the smaller resolution and then try to switch to the larger resolution midway through the stream to see how the sample decoder would handle that case. There is a keyframe 2 seconds into the stream with the lower resolution so I cut to that point.

ffmpeg -i crowd_run_1080X512_fr30_bd8_frm_resize_l3.webm -ss 2 -vcodec copy -an -f ivf crowd_run_1080X512_fr30_bd8_frm_resize_l3_skip.webm.ivf

./sample_decode vp9 -d -hw -p vp9d_hw -device /dev/dri/renderD128 -vaapi -i crowd_run_1080X512_fr30_bd8_frm_resize_l3_skip.webm.ivf -o ouput.yuv -i420

libva_trace.log.140127.thd-0x00006fe3.log libva_trace.log.140127.thd-0x00006fe4.log libva_trace.log.140127.thd-0x00006fe5.log

It looks like the sample decoder destroys the surfaces when the larger resolution is encountered but does not destroy the context. So again pointing to an issue with the destruction of the context with existing surfaces.

Jexu commented 2 years ago

It is api designed bahavior that when app tries to destroy the ctx, all resources bond with it will also be released. In term of VP9 DRC case, app can re-allocate surface when resolution changes or just like sample decode, re-allocate surface when resolution changes from smaller one to larger one which can avoid frequent allocation. Anyway, it is not required to re-allocate the whole ctx, unless app can ensure no reference for next frame.

Jexu commented 2 years ago

Since ffmpeg patch could solve your issue and I will close this one. Feel free to re-open it if having any other concerns.

intel / media-driver