rigaya / QSVEnc

QSVによる高速エンコードの性能実験
http://rigaya34589.blog135.fc2.com/blog-category-10.html
Other
313 stars 28 forks source link

MFXDEC: DecodeFrameAsync error: undefined behavior.. #99

Closed Selur closed 1 month ago

Selur commented 1 year ago

I'm running: QSVEncC64 --avhw -i "test.mkv" --input-buf 16 --slices 0 --fps 24000/1001 --codec av1 --sar 1:1 --icq 18 --quality best --bframes 6 --gop-len 0 --open-gop --colorrange limited --colormatrix bt709 --vpp-resize auto --output-res 720x390 --i-adapt --b-adapt --b-pyramid --colormatrix bt709 --output-depth 10 --output-file "G:\Temp\test-11-01@20_10_19_1210_05.av1"

encoding starts fine, but crashed at ~80% of the file:

--------------------------------------------------------------------------------
G:\Temp\test-11-01@20_10_19_1210_05.av1
--------------------------------------------------------------------------------
PG is not supported on this platform, switched to FF mode.
cop.AUDelimiter value changed off -> auto by driver
cop.PicTimingSEI value changed off -> auto by driver
cop.SingleSeiNalUnit value changed off -> auto by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver

 QSVEncC (x64) 7.23 (r2925) by rigaya, Nov  1 2022 03:56:44 (VC 1933/Win)
OS             Windows 11 x64 (22621) [UTF-8]
CPU Info       AMD Ryzen 9 3950X 16-Core Processor [4.00GHz] (16C/32T) <DG2>
GPU Info       Intel Arc A380 Graphics (128EU) 300-2450MHz (31.0.101.3793)
Media SDK      QuickSyncVideo (hardware encoder) FF, 2nd GPU, API v2.07
Async Depth    3 frames
Hyper Mode     off
Buffer Memory  d3d11, 58 work buffer
Input Info     avqsv: H.264/AVC, 1280x692, 24000/1001 fps
VPP            ColorFmtConvertion: nv12 -> p010
               Resize 1280x692 -> 720x390
AVSync         cfr
Output         AV1(yuv420 10bit) main @ Level 3
               720x390p 1:1 23.976fps (24000/1001fps)
Target usage   1 - best
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    18
QP Limit       min: none, max: none
Trellis        Auto
Ref frames     4 frames
GopRefDist     7, B-pyramid: on
Max GOP Length 240 frames
VUI            matrix:bt709,range:limited

[0.1%] 320 frames: 332.29 fps, 106 kb/s, remain 0:16:04, est out size 168.5MB  
....
[51.3%] 164731 frames: 429.66 fps, 463 kb/s, remain 0:06:03, GPU 47%, VD 61%, est out size 738.4MB
....
[81.6%] 261911 frames: 432.03 fps, 447 kb/s, remain 0:02:16, GPU 45%, VD 55%, est out size 713.0MB
[81.7%] 262145 frames: 430.86 fps, 447 kb/s, remain 0:02:16, est out size 713.2MB
MFXDEC: DecodeFrameAsync error: undefined behavior..
Break in task MFXDEC: undefined behavior..
encoded 262145 frames, 430.79 fps, 447.12 kbps, 582.77 MB
encode time 0:10:08, CPU: 7.6, GPU: 46.7, VD: 58.1
frame type IDR   1093
frame type I     2186,  total size   42.67 MB
frame type P    38229,  total size  269.52 MB
frame type B   222823,  total size  291.91 MB

exit code is -16. Encoding the input with NVEncC or x265 works fine too, so I guess it is not a general issue of the source file. I checked the source file in different mediaplayers and through Vapoursynth and there doesn't seem to be an issue around the frame QSVEnc crashed. Encoding short clips with the same settings works fine.

Adding '--disable-d3d' to the encoding call causes the encoding to run at ~30% of the speed (~150fps), GPU load 92-95% instead of ~50% (VD usage is roughly the same as with d3d11) and the encoding simply freezes.

PG is not supported on this platform, switched to FF mode.
cop.AUDelimiter value changed off -> auto by driver
cop.PicTimingSEI value changed off -> auto by driver
cop.SingleSeiNalUnit value changed off -> auto by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver
QSVEncC (x64) 7.23 (r2925) by rigaya, Nov  1 2022 03:56:44 (VC 1933/Win)
OS             Windows 11 x64 (22621) [UTF-8]
CPU Info       AMD Ryzen 9 3950X 16-Core Processor [4.00GHz] (16C/32T) <DG2>
GPU Info       Intel Arc A380 Graphics (128EU) 300-2450MHz (31.0.101.3793)
Media SDK      QuickSyncVideo (hardware encoder) FF, 2nd GPU, API v2.07
Async Depth    3 frames
Hyper Mode     off
Buffer Memory  system, 49 work buffer
Input Info     avqsv: H.264/AVC, 1280x692, 24000/1001 fps
VPP            ColorFmtConvertion: nv12 -> p010
               Resize 1280x692 -> 720x390
AVSync         cfr
Output         AV1(yuv420 10bit) main @ Level 3
               720x390p 1:1 23.976fps (24000/1001fps)
Target usage   1 - best
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    18
QP Limit       min: none, max: none
Trellis        Auto
Ref frames     4 frames
GopRefDist     7, B-pyramid: on
Max GOP Length 240 frames
VUI            matrix:bt709,range:limited
[81.7%] 262133 frames: 153.28 fps, 447 kb/s, remain 0:06:23, GPU 83%, VD 45%, est out size 713.2MB

Using --avsw the encoding is running at 250fps GPU usage is around 65-75% and VD is ~50%. but also fails:

MFXENCODE: EncodeFrameAsync error: device operation failure..GPU 70%, VD 50%, est out size 713.4MB
Break in task MFXENCODE: device operation failure..

encoded 262145 frames, 259.64 fps, 447.26 kbps, 582.95 MB
encode time 0:16:49, CPU: 9.6%, GPU: 71.2%, VD: 47.9%
frame type IDR   1093
frame type I     2186,  total size   42.68 MB
frame type P    38229,  total size  269.56 MB
frame type B   222823,  total size  292.05 MB

Is this a bug in QSVEnc? Is this some driver issues? Any idea? (will try a different driver version tomorrow)

Selur commented 1 year ago

Using 31.0.101.349 drivers didn't make a difference. After lowering the number of bframes to 5 (--bframes 5):

PG is not supported on this platform, switched to FF mode.
cop.AUDelimiter value changed off -> auto by driver
cop.PicTimingSEI value changed off -> auto by driver
cop.SingleSeiNalUnit value changed off -> auto by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver
QSVEncC (x64) 7.23 (r2925) by rigaya, Nov  1 2022 03:56:44 (VC 1933/Win)
OS             Windows 11 x64 (22621) [UTF-8]
CPU Info       AMD Ryzen 9 3950X 16-Core Processor [4.00GHz] (16C/32T) <DG2>
GPU Info       Intel Arc A380  Graphics (128EU) 300-2450MHz (31.0.101.3490)
Media SDK      QuickSyncVideo (hardware encoder) FF, 2nd GPU, API v2.07
Async Depth    3 frames
Hyper Mode     off
Buffer Memory  d3d11, 57 work buffer
Input Info     avqsv: H.264/AVC, 1280x692, 24000/1001 fps
VPP            ColorFmtConvertion: nv12 -> p010
               Resize 1280x692 -> 720x390
AVSync         cfr
Output         AV1(yuv420 10bit) main @ Level 3
               720x390p 1:1 23.976fps (24000/1001fps)
Target usage   1 - best
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    18
QP Limit       min: none, max: none
Trellis        Auto
Ref frames     4 frames
GopRefDist     6, B-pyramid: on
Max GOP Length 240 frames
VUI            matrix:bt709,range:limited

encoded 300768 frames, 441.12 fps, 428.85 kbps, 641.31 MB
encode time 0:11:21, CPU: 7.4%, GPU: 45.9%, VD: 57.8%
frame type IDR   1254
frame type I     2508,  total size   48.15 MB
frame type P    50128,  total size  317.85 MB
frame type B   249386,  total size  299.38 MB

seems like GopRefDist needs to be restricted to 6.

rigaya commented 1 year ago

Thank you for sharing the info.

In most cases DecodeFrameAsync error has something wrong in the input file or driver issue. Also EncodeFrameAsync error is driver issue in many cases.

Intel themselves have made GopRefDist default as 8 for AV1 hw encoding and QSVEnc has followed it. Therefore, it's kind of weird that we have trouble with GopRefDist = 8 and need to lower it to 6.

I think we need to wait for further driver updates for stability.

mikk9 commented 1 year ago

GopRefDist 8 is Intels default which works fine, never had issues with this. GopRefDist 7 and 6 is not a good choice, the better alternative is 4 and 2. Sometimes 2 or 4 can be better in a objective metric test like VMAF but to me the subjective quality is better with GopRefDist 8.

mikk9 commented 1 year ago

video.mfx.GopRefDist == 1 || video.mfx.GopRefDist == 2 || video.mfx.GopRefDist == 4 || video.mfx.GopRefDist == 8 || video.mfx.GopRefDist == 16) https://github.com/oneapi-src/oneVPL-intel-gpu/commit/309f906adc71d37db6a6c6d54224cc1e9056f8f0

There is no 6 or 7.

Selur commented 1 year ago

There is no 6 or 7. How do you explain the NVEncC output GopRefDist 7, B-pyramid: on and GopRefDist 6, B-pyramid: on if there is no 6 or 7 ?

mikk9 commented 1 year ago

There is no 6 or 7 in any of Intels documentation. I have tested 5, 6, 7 and they are worse than 2, 4, 8. Nvidias implementation could differ, no idea why you mix Nvidia and Intel here. This is certainly not helpful. You can use 5-7 on Intel, if it makes sense is another question and it might cause issues. With CQP you can even use 16 on Intel by the way.

Selur commented 1 year ago

The only time I wrote about NVIDIA is to confirm that it's not an issue of the source. All posted outputs are from QSVEnc which uses the Intel GPU.

rigaya commented 1 month ago

I'll close this issue, as I think recent drivers don't have this kind of instability, even with --gop-ref-dist 16.