inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.05k stars 241 forks source link

POCL driver not found when installing in virtualenv #537

Closed jacklovell closed 1 year ago

jacklovell commented 2 years ago

Describe the bug When pyopencl[pocl] is installed in a virtual environment on a system with no other OpenCL drivers, the POCL ICD is not found. It's necessary to set the environment variable OCL_ICD_VENDORS to <path-topyopencl-install>/.libs to get pyopencl to see PCOL as a driver. This is not documented in the pyopencl documentation, which suggests that simply installing the pyopencl wheel with the pocl extra is sufficient.

To Reproduce Steps to reproduce the behavior:

  1. Create a new virtual environment on a machine without OpenCL installed: python3 -m venv /tmp/venv && /tmp/venv/bin/activate
  2. pip install pyopencl[pocl]
  3. Run python -c 'import pyopencl; pyopencl.get_platforms()'
  4. See error pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

Expected behavior get_platforms() should return a POCL platform.

Environment (please complete the following information):

Additional context The same issue is present on a Scientific Linux 7 (RHEL7 clone) system with Python 3.7. On this system the Python executable is provided by Anaconda, but a standard virtual environment created using the venv standard library module is used rather than a conda environment. The workaround of setting OCL_ICD_VENDORS still works on this system.

The use case is creating virtualenvs to test code using pyopencl, where root access to install system-side OpenCL is not available and the availability of conda is not guaranteed.

The closest I've got to a portable workaround is:

export OCL_ICD_VENDORS=$(python -c 'import os, pyopencl; print(os.path.join(*pyopencl.__path__, ".libs"))')

But this enforces the use of POCL and so isn't a universal solution as it shouldn't be applied to systems which do already have OpenCL installed globally.

inducer commented 2 years ago

Thanks for the report!

The way this is supposed to work is that the loader that's baked into the pyopencl wheel has that search path baked in:

https://github.com/inducer/pyopencl/blob/0b3d0ef92497e6838eea300b974f385f94cb5100/scripts/build-wheels.sh#L43-L44

That points to this patch:

https://github.com/isuruf/ocl-icd/commit/3862386b51930f95d9ad1089f7157a98165d5a6b.patch

Do you have any sense why that scheme isn't working as intended? (Maybe investigate with strace?)

jacklovell commented 2 years ago

Attached are two straces. The first is running the following command, without specifying the OCL_ICD_VENDORS variable:

strace python -c 'import pyopencl; pyopencl.get_platforms()'

The second is with setting the environment variable:

OCL_ICD_VENDORS=/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs strace python -c 'import pyopencl; pyopencl.get_platforms()'

It looks like the significant difference is from line 3295 of the traces: in the first instance it attempts to open /etc/OpenCL/vendors which fails with ENOENT and then attempts a bunch of paths which end in <string>. In the second case it opens /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/ and successfully finds the pocl.icd file and in turn the POCL driver.

The use of <string> looks suspiciously like the variable hasn't been defined properly, but I don't know enough about how the system works internally to tell whether this is a problem or not.

cl-strace.log

cl-strace-envset.log

isuruf commented 2 years ago

What do you get when you run export OCL_ICD_DEBUG=7 and then start the python interpreter?

jacklovell commented 2 years ago
(pocl-venv) jlovell@jlovell-thinkpad:~$ OCL_ICD_DEBUG=7 python -c 'import pyopencl; pyopencl.get_platforms()'
ocl-icd(ocl_icd_loader.c:737): __initClIcd: Reading icd list from '/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:1029): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Traceback (most recent call last):
  File "<string>", line 1, in <module>
pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

When I manually specify the path to the ICD directory it looks there instead of in /etc/OpenCL/vendors:

(pocl-venv) jlovell@jlovell-thinkpad:~$ OCL_ICD_DEBUG=7 OCL_ICD_VENDORS=/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/ python -c 'import pyopencl; pyopencl.get_platforms()'
ocl-icd(ocl_icd_loader.c:737): __initClIcd: Reading icd list from '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:232): _open_driver: Considering file '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs//pocl.icd'
ocl-icd(ocl_icd_loader.c:206): _load_icd: Loading ICD 'libpocl-3a06e60a.so'
ocl-icd(ocl_icd_loader.c:210): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:264): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:276): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:232): _open_driver: Considering file '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/pocl.icd'
ocl-icd(ocl_icd_loader.c:206): _load_icd: Loading ICD 'libpocl-3a06e60a.so'
ocl-icd(ocl_icd_loader.c:210): _load_icd: ICD[1] loaded
ocl-icd(ocl_icd_loader.c:264): _open_driver: return: 2/0x2
ocl-icd(ocl_icd_loader.c:276): _open_drivers: return: 2/0x2
ocl-icd(ocl_icd_loader.c:433): _find_and_check_platforms: Checking ICD 0/2
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962235520/0x7f205c2e8c80
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962236064/0x7f205c2e8ea0
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962163152/0x7f205c2d71d0
ocl-icd(ocl_icd_loader.c:482): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:304): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:314): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:489): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:559): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: OpenCL 1.2 pocl 1.3 Release, LLVM 7.0.1, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:433): _find_and_check_platforms: Checking ICD 1/2
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962235520/0x7f205c2e8c80
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962236064/0x7f205c2e8ea0
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962163152/0x7f205c2d71d0
ocl-icd(ocl_icd_loader.c:482): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:304): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:314): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:489): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:559): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: OpenCL 1.2 pocl 1.3 Release, LLVM 7.0.1, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:387): _sort_platforms: Nb platefroms: 2
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Platform sorted by GPU, CPU, DEV
ocl-icd(ocl_icd_loader.c:793): __initClIcd: 2 valid vendor(s)!
ocl-icd(ocl_icd_loader.c:1025): clGetPlatformIDs: Entering

Same behaviour in the interactive python interpreter.

Manually setting PYOPENCL_HOME before starting python also fails in the same way as if it is not set. So it looks like the environment variable isn't getting picked up.

jacklovell commented 2 years ago

Definitely using the wheel-provided libOpenCL too, so it should have the patch you mentioned. Grepping that SO does indicate it has PYOPENCL_HOME inside the library.

(pocl-venv) jlovell@jlovell-thinkpad:~$ ldd /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/_cl.cpython-38-x86_64-linux-gnu.so 
    linux-vdso.so.1 (0x00007ffd960ea000)
    libOpenCL-cf4d6695.so.1.0.0 => /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/libOpenCL-cf4d6695.so.1.0.0 (0x00007f1b1e233000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1b1e024000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1b1ded5000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1b1deba000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1b1de97000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1b1dca5000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1b1dc9d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1b1e37a000
inducer commented 2 years ago

That's mysterious. Why does it say "missing global symbol" and then return an address for it? And why does this work on other systems?

jacklovell commented 2 years ago

I've been able to reproduce this using Github Actions: compare https://github.com/cherab/core/runs/5290607772?check_suite_focus=true where I didn't properly set the OCL_ICD_VENDORS environment variable for the job with https://github.com/cherab/core/runs/5291207783?check_suite_focus=true where I managed to do it correctly. So it should be possible for you to reproduce this too for testing.

I'm afraid I don't know enough about the OpenCL loader to speculate on why it's doing this on some systems but not others.

jacklovell commented 1 year ago

Update: the workaround of manually setting OCL_ICD_VENDORS no longer works with pyopencl 2022.2.3

inducer commented 1 year ago

Could you use some of the same troubleshooting techniques (strace, ldd) to see why that might be happening?

isuruf commented 1 year ago

Fixed in https://github.com/inducer/pyopencl/pull/635

jacklovell commented 1 year ago

Seems to work with the wheels in the #635 build artifacts, thanks!

Took a bit of trial and error, as I hadn't realised that the pocl ICD was added to site-packages/pyopencl/.libs by pocl-binary-distribution and not pyopencl, then got confused as to why those files were missing after uninstalling the previous version of pyopencl and deleting the pyopencl directory entirely in site-packages. Reinstalling pocl-binary-distribution along with the patched version of pyopencl fixed things.

I can also confirm that it's no longer necessary with #635 to manually set OCL_ICD_VENDORS for the ICD to be picked up.