rigaya / QSVEnc

QSVによる高速エンコードの性能実験
http://rigaya34589.blog135.fc2.com/blog-category-10.html
Other
304 stars 29 forks source link

testing dGPU - ARC DG2 - decoding errors - edge cases - 4:4:4 12bit #100

Open bavdevc opened 1 year ago

bavdevc commented 1 year ago

Hello @rigaya

atm. I'm testing the Intel ARC dGPU (A380), everything working brilliantly using windows/current windows beta driver (31.0.101.3793) - but linux is a bit troublesome so far (intel devs: 6x kernel driver not ready, backport-i915 some errors, intel media-driver not en par with windows etc.)

linux --check-features output differs from windows:

btw. could you test the 4:4:4 decode so far?

I tried (high bitrate):

==> everything else (low bitrate) is working fine, except VC1 decoding, that is painfully slow because of no hardware support in libvpl...and all those mem copy things

btw. if you need some samples/test material, I can provide you those - just tell me where to send those files/links

Kind regards

edit: we need a party in qsvenc - issue #100 now ;-) https://github.com/rigaya/QSVEnc/issues/100

rigaya commented 1 year ago

Thank you for sharing decode isssues.

bavdevc commented 1 year ago

ok, I was testing 4K HDR P3 PQ 444 60fps material - perhaps that was too much for the hardware decoder - avsw working fine with all input files.

source file is Prores 4444 xq working fine with avsw: plotbitrate_4k_hdr_prores_4444_xq_yuv444p12le lossless x265 yuv420p10le working fine with avhw: plotbitrate_4k_hdr_X265_yuv420p10le lossless x265 yuv422p10le working fine with avhw: plotbitrate_4k_hdr_X265_yuv422p10le lossless x265 yuv444p12le crashes hw decoder, only avsw possible: plotbitrate_4k_hdr_X265_yuv444p12le

but I think those are only edge cases for testing the hardware features - production workflow would not re-encode with libx265 or libaom-av1 444 12bit lossless before further processing

bavdevc commented 1 year ago
  • However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.

I think so, too - software stack is getting better and more complete with every version, it's still development in progress

btw. I'm really surprised this little dg2 card can handle 1,493,818 kbit/s input with ease edit: I think the hardware limitation is below 4,294,967,295 ;-) smells like uint32 in bit/s, last working frame is 1126 in my sample: plotbitrate_4k_hdr_X265_yuv444p12le_1126

bavdevc commented 1 year ago

just to really complete the decoder test, I also tested all (most combinations) of the other input formats (every format works with avsw, the following list is only for avhw/avqsv):

btw. I think I'm done decoder testing atm. - I'll keep those ffmpeg/generated test files to test them with all the future driver/qsvencc releases - perhaps I'll automate that step with a little script for windows/linux

rigaya commented 1 year ago

I was able to reproduce the HEVC 12bit 4:4:4 created myself using x265 lossless, running into "device operation failure".

It seems like it might be hardware limitation (or driver issue?), as there were no problem found in the application side, the bitrate of the input file was 4317Mbps, way too high...

bavdevc commented 1 year ago

thank you @rigaya for the confirmation - as you can see in my previous post I could make everything to work with hardware decoding except VP9 decode (tested profile 2+3) - either it is just my test files that go too far or there is still an error somewhere in the complete software stack. (btw. VP9 encoding works, slow but it works - but decoding no chance so far).

btw. I would close that issue #100 at the current state and create a new one if something noteworthy would change to the better or worse in the future if that is ok with you.

btw. one last technical question, perhaps you know the answer or can tell me where I can find some more info: -> using windows driver and Dx11va I notice there are several threads for GPU tasks: HWINFO64: hwinfo_gpu_engines Taskmanager: taskmanager_gpu_engines

--> crop/resize and vpp-deinterlace uses the the 1st or the 2nd "Video processing" engines --> vpp-yadif uses the "GPU compute" engine

==> but why do some movies use "Video decode 1" engine and some others use both "Video Decode" engines? even if the first one is not saturated at all?