Closed akukanov closed 1 month ago
I ran a test similar to the one described here https://github.com/oneapi-src/oneDPL/pull/1154/files#r1317837347 using this branch. The test is a check that we can include oneDPL headers without crashing when no device is available.
On origin/main
, when running the test, it passes when a sycl device is available, but crashes when ONEAPI_DEVICE_SELECTOR='!*:*'
is specified with PI_ERROR_DEVICE_NOT_FOUND
.
On this branch, the test passes with no error with or without ONEAPI_DEVICE_SELECTOR='!*:*'
. It seems this branch resolves the issue successfully, if we are satisfied with the approach and if it succeeds in preserving the layout.
In order to address #1631, the approach proposed in the patch is not enough and the predefined policies should never initialize SYCL queues before the first use. I think it might still be doable in backward-compatible manner, but it makes sense to create a new PR for that; see #1652.
This patch proposes a different approach to address the issue #1060.
Previously we tried to re-implement the
device_policy
class in a way that would avoid eager initialization of a SYCL queue in the constructor, and instead would do it on the first use. That cannot be done however without having layout changes for the class, which we prefer to avoid for sake of preserving backward compatibility.However, the issue #1060 does not really require lazy initialization; it asks for absence of exceptions or any other runtime errors in case a global SYCL queue object cannot be initialized. We use SYCL's default selector to create a queue used by the
dpcpp_default
predefined policy, which is supposed to always return a valid device because a SYCL implementation must provide at least one valid device. So our current implementation is formally correct, while the problem is observed in a strange environment where SYCL cannot find/initialize any device, even a CPU device; that seems to be a territory of undefined behavior.Therefore a workaround can be implemented that detects during the construction of
dpcpp_default
that there are no SYCL devices and does not even try to initialize the queue, instead leaving the policy in an unusable state. According to the SYCL specification, the set of available platforms and devices does not change during the program lifetime, therefore our device policies, includingdpcpp_default
, cannot be used anyway (again, it's practically a UB territory) - so just silently ignoring the issue at global policy construction and letting it fire on the actual use seems OK.The patch here attempts to implement this approach without breaking the class layout. For that, use of
sycl::queue
in the policies is replaced with a specially crafter__queue_holder
, which has the same size and alignment assycl::queue
and uses "placement new" to construct a real queue in the space taken by the holder. In most cases, the queue is constructed unconditionally, but a special constructor for predefined policy instances checks if there is at least one device available. As an indication that the construction was omitted, the firstsizeof(void*)
bytes of the holder are nullified; the rationale is that a SYCL queue is likely implemented as ashared_ptr
(that's certainly the case for DPC++), which in turn typically holds several pointers to the actual object, a service block or a reference counter, etc. It is highly unlikely therefore that the first bytes in a properly constructed queue object will be equivalent to a null pointer. That validity check, however, is only used in the destructor; all other methods assume that the queue was properly constructed, which is asserted in the__queue_ref
method that returns a reference to the queue kept in the holder.