intel / media-driver

Intel Graphics Media Driver to support hardware decode, encode and video processing.
https://github.com/intel/media-driver/wiki
Other
939 stars 337 forks source link

[Bug]: vaGetImage takes 90ms on each FullHD frame #1824

Open aslobodeniuk opened 6 days ago

aslobodeniuk commented 6 days ago

Which component impacted?

Video Processing

Is it regression? Good in old configuration?

This issue doesn't reproduce with i965 driver

What happened?

This happens on a certain hardware with UHD Graphics 605 GPU.

This issue seems to happen on all the versions of iHD_driver, we checked on 20.1.1 iHD and 23.4.1 . Reproduces with both ffmpeg and gstreamer (all latest versions), and any Full HD video.

How to reproduce:

$ wget https://test-videos.co.uk/vids/bigbuckbunny/mkv/1080/Big_Buck_Bunny_1080_10s_1MB.mkv

$ ffmpeg -y -an -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i Big_Buck_Bunny_1080_10s_1MB.mkv -vf hwdownload,format=nv12 -f null

so in the output of ffmpeg we can see it only reaches ~10 fps.

Same 10 fps are reached if we download with gstreamer vah264dec element

gst-launch-1.0 -v filesrc location=Big_Buck_Bunny_1080_10s_1MB.mkv ! parsebin ! vah264dec ! "video/x-raw" ! fpsdisplaysink video-sink=fakesink sync=false

Checking the libva traces we can see that the slowest part is the vaGetImage, it always takes around 90-100ms

[20491.582113][ctx       none]=========vaCreateImage ret = VA_STATUS_SUCCESS, success (no error) 
[20491.680938][ctx       none]=========vaGetImage ret = VA_STATUS_SUCCESS, success (no error) 

Meanwhile without downloading to CPU memory the playback of the same file can reach 700fps. To give an approximate benchmark of the CPU - software decoding of the same file reaches 300 fps, so it's not that incredibly slow.

Do you know a way to confirm it's a hardware or a driver issue?

What's the usage scenario when you are seeing the problem?

Playback

What impacted?

No response

Debug Information

lshw -C display

  *-display
       description: VGA compatible controller
       product: UHD Graphics 605
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 06
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:128 memory:a0000000-a0ffffff memory:90000000-9fffffff ioport:f000(size=64) memory:c0000-dffff

cat /proc/cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 122
model name  : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping    : 8
microcode   : 0xc
cpu MHz     : 900.000
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 24
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 122
model name  : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping    : 8
microcode   : 0xc
cpu MHz     : 800.000
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 24
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 122
model name  : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping    : 8
microcode   : 0xc
cpu MHz     : 1178.455
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 24
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 122
model name  : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping    : 8
microcode   : 0xc
cpu MHz     : 800.000
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 24
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

Do you want to contribute a patch to fix the issue?

None

aslobodeniuk commented 5 days ago

update: with i965 driver it reaches ~100fps, in other words the issue doesn't reproduce

intel-mediadev commented 2 days ago

Auto Created VSMGWL-74602 for further analysis.