intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.15k stars 234 forks source link

Hardware Accelerated Tonemapping for Tiger Lake broken after 21.49.21786 #488

Closed 88fingerslukee closed 1 year ago

88fingerslukee commented 2 years ago

I'm running a Plex on an Ubuntu docker image which is running on an i7-1165G7.

Every version of the drivers after 21.49.21786 breaks HW HDR->SDR tonemapping. I am not really sure what other info to provide besides this but I can give whatever is needed to help here.

poneli commented 2 years ago

Same issue for me. Tested all version above until the latest one. But latest working version is 21.49.21786.

System vmware esxi 7 host with cpu Intel(R) Core(TM) i7-9700K gpu passthrough to Ubuntu server VM running version 20.04.3 LTS (fully updated) plexmediaserver 1.25.3.5409

Ge082 commented 2 years ago

Try using official Intel's repo https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-focal.html

sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/graphics/intel-graphics.key |
  sudo apt-key add -
sudo apt-add-repository \
  'deb [arch=amd64] https://repositories.intel.com/graphics/ubuntu focal main'

Then

sudo apt-get update
sudo apt-get install \
  intel-opencl-icd \
  intel-level-zero-gpu level-zero \
  intel-media-va-driver-non-free libmfx1
88fingerslukee commented 2 years ago

this doesn't work. It still has the same error. The only thing that works is downgrading to 21.49.21786

alanshelley commented 2 years ago

I have the same problem and likewise my solution so far has been to downgrade to 21.49.21786.

kevindd992002 commented 2 years ago

Any updates to this issue?

kevindd992002 commented 2 years ago

Can anyone please give any updates to this issue? Is it fixed with the latest version of the runtime libraries?

poneli commented 2 years ago

Give it a try and report back.

kevindd992002 commented 2 years ago

I already found a Plex thread mentioning that it is still not fixed.

pwilma commented 2 years ago

I was able to reproduce the problem locally using NUC with KBL. I see hardware accelerated transcoding with driver 21.49.21786 where CPU usage is about 40% and very slow software transcoding with driver 21.50.21939 with CPU usage at about 350% (4 cores). Based on that observation I isolated regression to commit https://github.com/intel/compute-runtime/commit/34d9d9b0d389077a2df5434dd9277e8f257a8568 which is "gmmlib revision update". What is important this commit changes libigdgmm version on dependency list from version 11 to 12. It was not clear why it has broken Neo in Plex so I created a wrapper on "/usr/lib/plexmediaserver/Plex Transcoder" to execute it with strace. It turned out that Plex uses its own copy of libigdgmm.so:

open("/usr/lib/plexmediaserver/lib/dri/../libigdgmm.so.plex", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5

and most probably, as it is older version, it is no longer compatible with Neo (libigdrcl.so). So the process "Plex Transcoder" first loads libigdgmm.so.plex and then Neo (libigdrcl.so) tries to load libigdgmm.so.12 and they most probably collide. I did quick experiment that I renamed libigdgmm.so.plex and linked libigdgmm.so.12 to it:

sudo mv /usr/lib/plexmediaserver/lib/libigdgmm.so.plex /usr/lib/plexmediaserver/lib/libigdgmm.so.plex_ORIG
sudo ln -s /usr/local/lib/libigdgmm.so.12 /usr/lib/plexmediaserver/lib/libigdgmm.so.plex

and OpenCL accelerated tone mapping works again. So the rootcause is conflicting libigdgmm.so libraries loaded in single process and I don't see how this could be solved in Neo. Maybe Plex should use libigdgmm which is already installed in the system instead of using its own copy?

pwilma commented 2 years ago

Plex forum thread about this issue: https://forums.plex.tv/t/anyone-have-been-able-to-hw-transcode-on-an-intel-nuc-11-iris-xe/695381/505

rbranson commented 2 years ago

This might only be because it's dockerized but this was manifesting as segfaults for me:

Plex Transcoder[4286]: segfault at 7fd216bf46b8 ip 00007fcb62905818 sp 00007ffd972c3050 error 4 in libigdgmm.so.plex[7fcb62877000+90000]

The magic fix running Ubuntu 20.04 (focal) and PMS in Docker (plexinc/pms-docker) was a combo of this set of packages (from the Intel apt repo):

apt-get install -y \
    intel-igc-cm=1.0.128+i699.3~u20.04 \
    intel-opencl-icd=21.49.21786+i643~u20.04 \
    libigc1=1.0.10409+i699.3~u20.04 \
    libigdfcl1=1.0.10409+i699.3~u20.04 \
    libigdgmm11=21.3.3+i643~u20.04

... and rolled back to this version of the docker image: plexinc/pms-docker:1.28.0.5999-97678ded3

Hardware is a NUC 10 with i3-10110U.

jalaziz commented 2 years ago

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

rnsc commented 2 years ago

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

Are you sure? I haven't tested myself but seen report from people saying it's still not working.

jalaziz commented 2 years ago

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

Are you sure? I haven't tested myself but seen report from people saying it's still not working.

I thought it wasn't at first, but after upgrading everything and restarting HW HDR->SDR started working again.

f3rr commented 2 years ago

I tested the latest image version 4k to 4k transcode buffers with HDR tonemapping on, but works well with HDR tonemapping off. in both cases dashboard shows HW transcoding. J5005 cpu with integrated gpu. cpu usage is the same with HDR tm on/off (about 30%)

pwilma commented 1 year ago

I installed new version of Plex (plexmediaserver_1.29.2.6364-6d72b0cf6_amd64.deb) to confirm if hw accelerated tone mapping has been fixed there. Indeed it works correctly now with hardware acceleration (at least for experiments I did).

I analyzed deeper how Plex Transcoder loads libraries. I can see that it first loads iHD_drv_video.so with dlopen() and iHD_drv_video.so is linked against libigdgmm.so.plex:


intel@intel-NUC7i3BNK:~/plex$ ldd /usr/lib/plexmediaserver/lib/dri/iHD_drv_video.so
        linux-vdso.so.1 (0x00007fff797d2000)
        libgcompat.so.0 => /usr/lib/plexmediaserver/lib/dri/../libgcompat.so.0 (0x00007f151d528000)
        libigdgmm.so.plex => /usr/lib/plexmediaserver/lib/dri/../libigdgmm.so.plex (0x00007f151d46e000)
        libc.so => /usr/lib/plexmediaserver/lib/dri/../libc.so (0x00007f151d3cb000)

I tried to intercept dlopen() calls to check what flags are used. My assumption was that if libigdgmm.so.plex was loaded with RTLD_LOCAL flag it should not collide with symbols from libigdgmm loaded by compute runtime. I wrote simple shared library with own definition of dlopen() to load with LD_PRELOAD. It allowed me to print flags and then call original dlopen(). This approach correctly captured iHD_drv_video.so and even other libs, but unfortunately is was not able to intercept libigdgmm.so.plex, probably because it's not loaded with dlopen but was specified at compilation time. I can see that iHD_drv_video.so is loaded with following flags:

RTLD_NOW
RTLD_GLOBAL
RTLD_NODELETE

I tried to experiment with overwriting RTLD_GLOBAL to RTLD_LOCAL, but unfortunately it resulted in fallback to software path for tone mapping, so it looks like a dead end.

Nevertheless new Plex version I used (plexmediaserver_1.29.2.6364-6d72b0cf6_amd64.deb) indeed used hardware accelerated transcoding also when newer GPU driver was installed in the system, but it quickly turned out that this is because Plex uses own GPU UMD driver copy what can be seen in strace log:

stat("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/CL-ICDs/icr.icd", {st_mode=S_IFREG|0644, st_size=111, ...}) = 0
open("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/CL-ICDs/icr.icd", O_RDONLY|O_LARGEFILE) = 7
lseek(7, 0, SEEK_END)                   = 111
lseek(7, 0, SEEK_CUR)                   = 111
lseek(7, 0, SEEK_SET)                   = 0
readv(7, [{iov_base="/var/lib/plexmediaserver/Library"..., iov_len=110}, {iov_base="\n", iov_len=1024}], 2) = 111
open("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Drivers/icr-9-linux-x86_64/libigdrcl.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 8

Further analysis showed that this compute runtime library is most probably built by Plex as it contains many 'plex' strings and even modified reference to libigdgmm:

intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ strings  /var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Drivers/icr-9-linux-x86_64/libigdrcl.so | grep libigdgmm
libigdgmm.so.plex

For original compute runtime in the system it looks a bit different:

intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ cat /etc/OpenCL/vendors/intel.icd
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ strings /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so | grep libigdgmm
libigdgmm.so.12

The runtime library from plex also contains many strings with paths and it even contains used Neo version:

/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/device_factory.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/metrics_library.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/opencl/source/program/process_device_binary.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/linux/os_context_linux.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/page_fault_manager/linux/cpu_page_fault_manager_linux.cpp

So the conclusion is that new version of Plex uses own compiled compute runtime version 22.16.22992-2 thus it is independent from Neo installed in the system. Because it is now shipped as part of Plex package and as experiments showed hardware accelerated tone mapping works for fine this Plex version we may in my opinion close this issue.

gobigdave commented 1 year ago

Plex 1.29.x in docker worked initially. Later versions, including 1.29.2.x stopped working for me. I now have 1.30.0.6486 in docker on Ubuntu server, and HDR tone mapping is still only software.

pwilma commented 1 year ago

I checked once again with most recent plex version which is now plexmediaserver_1.31.2.6739-a87e876bd_amd64.deb. I still see hardware accelerated tone mapping working corrctly and Plex still uses self compiled compute runtime located in: /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Drivers/icr-9-linux-x86_64 Based on strings from libigdrcl.so library, compute runtime version used there is 22.16.22992-2. I did a quick experiment and renamed libigdrcl.so from that directory and indeed hardware acceleration was not used in that case what confirms that Plex uses it's own version of Intel compute runtime for hardware acceleration. 

Based on that I'm closing this issue as we cannot guarantee the quality, if application vendor ships his own (potentially modified) version of compute runtime. I suggest to contact Plex support directly for issues like this one.