rigaya / QSVEnc

QSVによる高速エンコードの性能実験
http://rigaya34589.blog135.fc2.com/blog-category-10.html
Other
313 stars 28 forks source link

high cpu usage #126

Closed ZSC2017IM closed 1 year ago

ZSC2017IM commented 1 year ago

Excuse me again, Rigaya, maybe I'm a problem child XD. As I recently replied in # 124, QSV will result in higher CPU usage at high speed transcoding efficiency. I'd appreciate it if this is not an INTEL problem and you can solve it in QSVEnc. The following is a low bit rate video I provided to you for testing (in order to obtain a high transcoding FPS) and possible test code. I got 20% CPU usage in my 8700K.

Thanks! QSVEncC64 --avhw -i "3_1.mkv" --device 1 --codec hevc --fixed-func --icq 32 -u fastest --avsync vfr --profile main --hyper-mode off -o "3.mp4" 3_1.zip

ZSC2017IM commented 1 year ago

Unfortunately, this is not a bug for QSVENC because I got the same results with MSDK tool sample_encode.exe h265 -dGfx -timeout 86400 -lowpower:on -hw -icq 28 -u speed -i 3.mkv -o 3.mp4 -w 720 -h 576

About 200 FPS and 20 CPU usage.

rigaya commented 1 year ago

I got around 960fps at 12900K + Arc A380, with around 5% CPU utilization (as 12900K has 24 threads).

With such very high fps, there is many task to be done on the CPU side too, such as no wonder CPU usage getting little bit high.

ZSC2017IM commented 1 year ago

@rigaya Thanks for your test! I corrected my previous mental arithmetic results by timing with a stopwatch : 10000FPS in 12 seconds, 20% CPU. According to your results, high CPU usage is not the particularity of my platform and driver. I use ffmpeg to call hevc_nvenc for transcoding, and use similar parameters. But a single instance can't run all the codecs of tesla p4. I ran four examples to make one encoder fully loaded, and obtained a similar FPS (about 960). But the total CPU utilization of all instances is less than 3%. As an excellent developer, do you think this is really caused by INTEL not adopting a similar scheme of NVIDIA? 20230318172906 https://developer.nvidia.com/blog/nvidia-ffmpeg-transcoding-guide/ I know that you also have Nvidia GPUs. Maybe you can also test NVDEC/NVENC's powerful low CPU usage ability in transcoding tasks.

ZSC2017IM commented 1 year ago

this is really caused by INTEL not adopting a similar scheme of NVIDIA? @rigaya https://github.com/Intel-Media-SDK/MediaSDK/wiki/Media-SDK-Shaders-(EU-Kernels)#gpu-copy-and-horizontal-mirroring-kernels I guess maybe this function of Intel is called "gpu copy", but QSVEnc doesn't provide it. I haven't improved it since I enabled it in ffmpeg/sample_encode. Among them, ffmpeg will report "GPU-accelerated memory copy only works in system memory mode." I think this may be related to the use of LPDDR4 memory as graphics memory in DG1. Can you test it on your machine? Parameters: sample_encode.exe h265 -dGfx -timeout 86400 -lowpower:on -hw -icq 28 -u speed -i 3.mkv -o 3.mp4 -w 720 -h 576 -gpucopy::on sample_encode.zip ffmpeg -hwaccel qsv -hwaccel_output_format qsv -hwaccel_device 0 -vcodec h264_qsv -gpu_copy on -i 3.mkv -acodec copy -c:v hevc_qsv -low_power 1 -preset veryfast -profile:v rext -global_quality 28 -y 3.mp4

quamt commented 1 year ago

@ZSC2017IM Here the result: encoded 6000 frames, 1213.35 fps, 499.26 kbps, 14.28 MB encode time 0:00:04, CPU: 0.2%, VD: 99.2% The taskmanger shows around 24% when running it, 20% for this task. But keep in mind this system only uses a AMD Ryzen 5 5600G

ZSC2017IM commented 1 year ago

@quamt Thanks! This is very helpful! Can you retest the following commands? If "gpucopy" also fails to reduce CPU usage, I think this is indeed a common problem with Intel GPUs. I am submitting this issue to the intel community

sample_encode.exe h265 -dGfx -timeout 86400 -lowpower:on -hw -icq 28 -u speed -i 3.mkv -o 3.mp4 -w 720 -h 576 -gpucopy::on sample_encode.zip

quamt commented 1 year ago

@ZSC2017IM Running that the task manger shows CPU around 15 percent, wherein the task uses around 13%.

ZSC2017IM commented 1 year ago

@quamt Thanks! I have reported this issue to Intel.

rigaya commented 1 year ago

I think CPU utilization of 20% is acceptable in high fps transcoding, I have closed this issue.