Closed majosm closed 1 month ago
I gave this a quick spin (with the example in https://github.com/inducer/pyopencl/issues/731#issuecomment-2071157745) and didn't notice any negative perfomance impact on the second run, as long as the downstream package (i.e., pocl, nvidia cl) has caching enabled. I tried pocl-cpu, pocl-cuda, nvidia cl, all on Linux (porter). Perhaps we could skip binary caching for all CL implementations?
Perhaps we could skip binary caching for all CL implementations?
That's a broad set to generalize over. :slightly_smiling_face: If they all have source -> executable caches, then sure, that'd probably be better. Nvidia has a such a cache, I think. Do you know about AMD and Intel?
@inducer I'm seeing an intermittent failure in the boxtree CI (here's a failing run and a successful run for the same code). Is this something to be concerned about?
Is this something to be concerned about?
Kind of, yeah. Can you reproduce it locally?
Maybe it has something to do with tests being run in parallel? How good is pocl about locking its cache?
Perhaps we could skip binary caching for all CL implementations?
That's a broad set to generalize over. 🙂 If they all have source -> executable caches, then sure, that'd probably be better. Nvidia has a such a cache, I think. Do you know about AMD and Intel?
It turns out that AMD rocm does not appear to cache built kernels :-( (tested with rocm 5.7.1 and 6.0.3 on tioga).
This has been merged as part of #749.
Fixes #731.