rigaya / QSVEnc

QSVによる高速エンコードの性能実験
http://rigaya34589.blog135.fc2.com/blog-category-10.html
Other
322 stars 31 forks source link

--vpp-deband problem + feature request #89

Closed SiV44 closed 2 years ago

SiV44 commented 2 years ago

Hello @rigaya

When I want to use the --vpp-deband option, the following error occurs:

trellis is not supported on current platform, disabled.
HEVC SAO is not supported on current platform, disabled.
cop.SingleSeiNalUnit value changed off -> auto by driver
cop2.BufferingPeriodSEI value changed 1 -> 0 by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver
QSVEncC (x64) 7.20 (r2882) by rigaya, Sep 21 2022 12:57:57 (VC 1933/Win)
OS             Windows 10 x64 (19044) [UTF-8]
CPU Info       Intel Core i7-7700HQ @ 2.80GHz [TB: 3.39GHz] (4C/8T) <Kabylake>
GPU Info       Intel HD Graphics 630 (24EU) 350-1100MHz [45W] (31.0.101.2111)
Media SDK      QuickSyncVideo (hardware encoder) PG, 2nd GPU, API v1.35
Async Depth    3 frames
Buffer Memory  d3d11, 27 work buffer
Input Info     avqsv: H.264/AVC, 3840x2160, 19001/317 fps
VPP            cspconv(nv12 -> yv12(16bit))
               deband: mode 1, range 15, threY 15, threCb 15, threCr 15
                       ditherY 15, ditherC 15, blurFirst no, randEachFrame no
               cspconv(yv12(16bit) -> p010)
AVSync         cfr
Output         HEVC(yuv420 10bit) main10 @ Level 5.1 (high tier)
               3840x2160p 1:1 59.940fps (19001/317fps)
Target usage   4 - balanced
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    23
QP Limit       min: 12, max: 63
Trellis        Auto
Ref frames     3 frames
Bframes        3 frames, B-pyramid: on
Max GOP Length 60 frames
VUI            matrix:bt709,colorprim:bt709,transfer:bt709,range:limited
Ext. Features  PerMBRC WeightP WeightB ctu:32 sao:none 
--------------------------------------------------------------------------
build log of Intel(R) HD Graphics 630...
3:397:3: error: expected identifier or '('
                if (!buffer) \
                ^
3:399:3: error: expected identifier or '('
                for (size_t i = 0; i < count; i++)  \
                ^
3:401:3: error: expected identifier or '('
                return CLRNG_SUCCESS; \
                ^
3:402:2: error: extraneous closing brace ('}')
        } \
        ^
3:404:14: error: program scope variable must reside in constant address space
        clrngStatus clrngMrg31k3pRandomIntegerArray_##fptype(clrngMrg31k3pStream* stream, cl_int i, cl_int j, size_t count, cl_int* buffer) { \
                    ^
3:404:46: error: expected ';' after top level declarator
        clrngStatus clrngMrg31k3pRandomIntegerArray_##fptype(clrngMrg31k3pStream* stream, cl_int i, cl_int j, size_t count, cl_int* buffer) { \
                                                    ^
                                                    ;
--------------------------------------------------------------------------
Error (clBuildProgram): build program failure.
deband: failed to load RGY_FILTER_DEBAND_GEN_RAND_CL(m_debandGenRand)
OPENCL: Error while running filter "deband".
Break in task OPENCL: OpenCL crushed..

encoded 0 frames, 0.00 fps, 0.00 kbps, 0.00 MB
encode time 0:00:00, CPULoad: 30.5
QSVEncC.exe finished with error!

Second, is it possible to add an option to set keyframe on chapter, as in NVEnc?

Thanks for your time, Regards

rigaya commented 2 years ago

I'm currently unsuccessful in reproducing the error, I can run fine with --vpp-deband with QSVEnc 7.20 + i7 7700K + 2111 driver.

Would you please share the full command line which causes the error?

Second, is it possible to add an option to set keyframe on chapter, as in NVEnc?

Unfortunately it might be difficult, as when I tested implementing it few years ago, it was unstable (encoding crushed), and could not make it stable.

SiV44 commented 2 years ago

I'm currently unsuccessful in reproducing the error, I can run fine with --vpp-deband with QSVEnc 7.20 + i7 7700K + 2111 driver.

Would you please share the full command line which causes the error?

--------------- Video encoding using QSVEnc 7.20 (r2882) ---------------

C:\StaxRip\Apps\Encoders\QSVEnc\QSVEncC64.exe --avhw --icq 23 --codec hevc --quality balanced --profile main10 --level 5.1 --tier high --avsync cfr --mbbrc --slices 1 --gop-len 60 --b-pyramid --strict-gop --trellis off --weightp --weightb --vpp-deband --sar 1:1 --colormatrix bt709 --colorprim bt709 --transfer bt709 --atc-sei unknown --colorrange limited --no-repeat-pps --output-buf 32 --output-thread -1 --mfx-thread -1 --vpp-perf-monitor --input-analyze 0 --d3d --sao none --fallback-rc --timer-period-tuning --process-codepage utf8 -i "D:\Documents\x264\UHD TEST Files\Sony Sushi UHD HFR SDR BT709 Demo.mkv" -o "E:\StaxRip\Temp\Sony Sushi UHD HFR SDR BT709 Demo_temp\Sony Sushi_out.hevc"

--------------------------------------------------------------------------------
E:\StaxRip\Temp\Sony Sushi UHD HFR SDR BT709 Demo_temp\Sony Sushi_out.hevc
--------------------------------------------------------------------------------
trellis is not supported on current platform, disabled.
HEVC SAO is not supported on current platform, disabled.
cop.SingleSeiNalUnit value changed off -> auto by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver
QSVEncC (x64) 7.20 (r2882) by rigaya, Sep 21 2022 12:57:57 (VC 1933/Win)
OS             Windows 10 x64 (19044) [UTF-8]
CPU Info       Intel Core i7-7700HQ @ 2.80GHz [TB: 3.40GHz] (4C/8T) <Kabylake>
GPU Info       Intel HD Graphics 630 (24EU) 350-1100MHz [45W] (31.0.101.2111)
Media SDK      QuickSyncVideo (hardware encoder) PG, 2nd GPU, API v1.35
Async Depth    3 frames
Buffer Memory  d3d11, 27 work buffer
Input Info     avqsv: H.264/AVC, 3840x2160, 19001/317 fps
VPP            cspconv(nv12 -> yv12(16bit))
               deband: mode 1, range 15, threY 15, threCb 15, threCr 15
                       ditherY 15, ditherC 15, blurFirst no, randEachFrame no
               cspconv(yv12(16bit) -> p010)
AVSync         cfr
Output         HEVC(yuv420 10bit) main10 @ Level 5.1 (high tier)
               3840x2160p 1:1 59.940fps (19001/317fps)
Target usage   4 - balanced
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    23
QP Limit       min: 12, max: 63
Trellis        Auto
Ref frames     3 frames
Bframes        3 frames, B-pyramid: on
Max GOP Length 60 frames
VUI            matrix:bt709,colorprim:bt709,transfer:bt709,range:limited
Ext. Features  PerMBRC WeightP WeightB ctu:32 sao:none 
--------------------------------------------------------------------------
build log of Intel(R) HD Graphics 630...
2:397:3: error: expected identifier or '('
                if (!buffer) \
                ^
2:399:3: error: expected identifier or '('
                for (size_t i = 0; i < count; i++)  \
                ^
2:401:3: error: expected identifier or '('
                return CLRNG_SUCCESS; \
                ^
2:402:2: error: extraneous closing brace ('}')
        } \
        ^
2:404:14: error: program scope variable must reside in constant address space
        clrngStatus clrngMrg31k3pRandomIntegerArray_##fptype(clrngMrg31k3pStream* stream, cl_int i, cl_int j, size_t count, cl_int* buffer) { \
                    ^
2:404:46: error: expected ';' after top level declarator
        clrngStatus clrngMrg31k3pRandomIntegerArray_##fptype(clrngMrg31k3pStream* stream, cl_int i, cl_int j, size_t count, cl_int* buffer) { \
                                                    ^
                                                    ;
--------------------------------------------------------------------------
Error (clBuildProgram): build program failure.
deband: failed to load RGY_FILTER_DEBAND_GEN_RAND_CL(m_debandGenRand)
OPENCL: Error while running filter "deband".
Break in task OPENCL: OpenCL crushed..

encoded 0 frames, 0.00 fps, 0.00 kbps, 0.00 MB
encode time 0:00:00, CPULoad: 43.1
Vpp Filter Performance
cspconv:   2982.5 us
deband:       1.8 us
QSVEncC.exe finished with error!

Start:    16:27:16
End:      16:27:19
Duration: 00:00:02

Result from --check-clinfo:

OpenCL platform #1 [0x0000013A95AFCCB0]
Intel(R) OpenCL HD Graphics Intel(R) Corporation OpenCL 3.0 [FULL_PROFILE]
  extensions:cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_simultaneous_sharing 
    device #0 [0x0000013A9D4C0090]
    Intel(R) HD Graphics 630 (24 CU) @ 1100 MHz (31.0.101.2111)
      device type :                gpu
      vendor :                     32902 (Intel(R) Corporation)
      profile :                    FULL_PROFILE
      version :                    OpenCL 3.0 NEO 
      extensions :                 cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_simultaneous_sharing 
      ip_version_intel :           589824
      id_intel :                   22811
      global_mem_size :            6521 MB
      global_mem_cache_size :      768 KB
      global_mem_cacheline_size :  64 B
      max_mem_alloc_size :         3260 MB
      mem_base_addr_align :        1024
      min_data_type_align_size :   128
      local_mem_size :             64 KB
      max_const_args :             8
      max_const_buffer_size :      3339172 KB
      image support :              yes
      image2d max size :           16384 x 16384
      image3d max size :           16384 x 16384 x 0
      image_pitch_alignment :      4
      max_image_args :             read 128, write 128
      profiling_timer_resolution : 83 ns
      max_parameter_size :         2048
      max_work_group_size :        256
      max_work_item_dims :         0
      num_slices_intel :           1
      num_subslices_intel :        3
      num_eus_per_subslice_intel : 8
      num_threads_per_eu_intel :   7
      feature_capabilities_intel : 0
      vec width char:              16/16
                short:             8/8
                int:               4/4
                long:              1/1
                half:              8/8
                float:             1/1
                double:            1/1

Unfortunately it might be difficult, as when I tested implementing it few years ago, it was unstable (encoding crushed), and could not make it stable.

Thank you very much for your answer.

SiV44 commented 2 years ago

I did tests with other OpenCL filters like colorspace conversion, resize, padding, noise reduction, warpsharp. Everything works fine, the only problem is deband, both via GUI and CLI. Doing a clean install of the driver also didn't change anything, and I didn't find any dependencies among the options, unless I missed something. Full debug log: LOG-DEBUG.txt

Regards

rigaya commented 2 years ago

Thank you for the full log, it helped me much, and I think I have found the cause. I'll try fixing it in the next release.

rigaya commented 2 years ago

Thank you for the log and details, QSVEnc 7.21 shall have the problem with --vpp-deband fixed.

SiV44 commented 2 years ago

The new version works, the encoding was successful.

trellis is not supported on current platform, disabled.
HEVC SAO is not supported on current platform, disabled.
cop.SingleSeiNalUnit value changed off -> auto by driver
cop3.DirectBiasAdjustment value changed off -> auto by driver
cop3.GlobalMotionBiasAdjustment value changed off -> auto by driver
QSVEncC (x64) 7.21 (r2902) by rigaya, Sep 30 2022 12:30:17 (VC 1933/Win)
OS             Windows 10 x64 (19044) [UTF-8]
CPU Info       Intel Core i7-7700HQ @ 2.80GHz [TB: 3.41GHz] (4C/8T) <Kabylake>
GPU Info       Intel HD Graphics 630 (24EU) 350-1100MHz [45W] (31.0.101.2111)
Media SDK      QuickSyncVideo (hardware encoder) PG, 2nd GPU, API v1.35
Async Depth    3 frames
Buffer Memory  d3d11, 27 work buffer
Input Info     avqsv: H.264/AVC, 3840x2160, 19001/317 fps
VPP            cspconv(nv12 -> yv12(16bit))
               deband: mode 1, range 15, threY 15, threCb 15, threCr 15
                       ditherY 15, ditherC 15, blurFirst yes, randEachFrame yes
               cspconv(yv12(16bit) -> p010)
AVSync         cfr
Output         HEVC(yuv420 10bit) main10 @ Level 5.1 (high tier)
               3840x2160p 1:1 59.940fps (19001/317fps)
               avwriter: hevc, ac3 => mp4
Target usage   4 - balanced
Encode Mode    ICQ (Intelligent Const. Quality)
ICQ Quality    23
QP Limit       min: 12, max: 63
Trellis        Auto
Ref frames     3 frames
Bframes        3 frames, B-pyramid: on
Max GOP Length 60 frames
VUI            matrix:bt709,colorprim:bt709,transfer:bt709,range:limited
Ext. Features  PerMBRC WeightP WeightB GPB ctu:32 sao:none

encoded 9521 frames, 8.36 fps, 13539.83 kbps, 256.38 MB
encode time 0:18:58, CPU: 12.6%, GPU: 80.9%, VD: 10.4%
frame type IDR  159
frame type I    159,  total size   42.85 MB
frame type P   2380,  total size  134.87 MB
frame type B   6982,  total size   78.67 MB

Vpp Filter Performance
cspconv:   4108.7 us
deband:   63325.8 us
cspconv:   3979.5 us

Thank you very much for your time and for solving this problem. I close this issue.