oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
715 stars 112 forks source link

Resolve global policy issues in a backward-compatible manner, attempt 2 #1652

Closed akukanov closed 2 days ago

akukanov commented 2 weeks ago

This patch is a follow-up to #1618 (with the commit history from there) with a somewhat modified approach. It is necessary to address #1631, one more issue caused by eager initialization of a SYCL queue in oneDPL headers. It still is supposed to address #1060.

The core idea is similar: to keep layout compatibility yet have lazy behavior, store SYCL queues inside a special __queue_holder class with the same size and alignment, and use the first sizeof(uintptr_t) bytes as a flag to detect if it holds a valid queue or - in case there are only zeros in these bytes - something else. To remind,

the rationale is that a SYCL queue is likely implemented as a shared_ptr (that's certainly the case for DPC++), which in turn typically holds several pointers to the actual object, a service block or a reference counter, etc. It is highly unlikely therefore that the first bytes in a properly constructed queue object will be equivalent to a null pointer.

Special device policy constructors are used only for predefined policies and, unlike #1618, always nullify the "flag" and additionally store a pointer to a factory function that generates a proper sycl::queue object as a copy of an internal "magic static" instance initialized on the first call to the factory. Each time when a device policy need to return its queue the "flag" is checked and, depending on its state, either a copy of the stored queue is returned or the factory is called.

It is worth noting that the workaround only applies to the predefined policies. Any explicitly created policies as well as copies of policies created by the implementation (if it chooses so) will contain a valid queue. Propagating factories to the copies of predefined policies is deliberately not done; it is impossible to always create queues on first use without breaking class layout, so having consistent behavior for all policies except predefined ones seems the best choice.

The implementation still introduces an undefined behavior, this time by possibly using an inactive union field (the "flag", in case a valid queue was constructed).

akukanov commented 3 days ago

what to do with clang-format failures

My viewpoint: ignore the remaining failures.