Open szymonlopaciuk opened 4 months ago
I would be careful with OpenCL 2.0, it is not supported well across drivers. 3.0 is better supported, but it does not include many 2.0 features. I could test on the Intel driver and PoCL in case...
Sorry for the late response, I noted it already yesterday. I can experiment with 3.0 too if you think relying on 2.0 is a problem, I had not known this. I had a quick look at the spec and CL3.0 also supports the __generic
address space, and it seems that it's all that I/we need.
We could leave the version configurable to the user, though I'm not sure of the benefit of that if our code does not compile on different versions (like currently, under 2.1)...
Do you support the idea of bumping the version in principle? The alternative is working around it, and it will be a bit annoying, though not impossible.
So:
--std-cl=CL2.0
flag.Does pocl choke on the C code or the flag? CPUs have the same memory space, so I expect the feature to be supported.
It chokes on the flag. I will have to check without, I wouldn't be surprised if it still has problems though: I'm seeing in clinfo
that it supports CL1.2 exclusively.
Okay, so to complicate matters further, I have confirmed a bug in Xobjects (or more likely in pyopencl): no matter which platform is selected, the first one is actually chosen for running the kernel. It's very sneaky, because the compiler messages seem "okay" (see screenshot), but the actual kernel is not run where we need it. I'm not sure why at this point, I briefly looked at the code of pyopencl and nothing particularly bad jumps out.
I first observed strange behaviour with gpustat where the gpus would light up even if allegedly running on CPU, and confirmed by using vendor specific macros in the kernel code (__NV_CL_C_VERSION
in this case for nvidia, which is the one producing the message in all the cases). See the screenshot, it's a bit ugly because I just put a printf in one of the tests, but still:
benchmarks on https://github.com/rdemaria/simpletrack works normally for me. I have upgraded last version everything. Interstingly I get:
POCL: Device: cpu-skylake-avx512-AMD Ryzen 9 7950X3D 16-Core Processor
turns npart t/turn[ms] t/turn/part[us]
10 18000 154.68 8.59
10 25000 254.29 10.17
NVIDIA: Device: NVIDIA TITAN V
turns npart t/turn[ms] t/turn/part[us]
10 18000 17.14 0.95
10 25000 17.74 0.71
INTEL: Device: AMD Ryzen 9 7950X3D 16-Core Processor
turns npart t/turn[ms] t/turn/part[us]
10 18000 415.34 23.07
10 25000 543.04 21.72
Description
It seems that OpenCL on CUDA is quite permissive, whereas on AMD machines, which more closely follow OpenCL spec, CL1.2 is taken as the default version even if the actual device supports higher ones.
The current version of Xtrack contains code incompatible with CL1.2. This is because in some instances local scope values are passed to
__global
parameters. CL2.0 introduces a default__generic
parameter, which accepts either, in a manner similar to CUDA. In particular the functionmultipole_compute_dpx_dpy_single_particle
receives arguments from either memory. Requiring CL2.0 is the easiest fix in this case: the alternatives are to explicitly make two versions of the function, or manually copy from the global memory to local.I am preparing a PR for Xtrack that fixes issues encountered on AMD, this is a prerequisite for those changes.
Checklist
Mandatory:
Optional: