Support for physical addressing

kpet commented 2 years ago

Motivation

Full support for OpenCL C requires physical addressing. While clspv has done a great job of enabling many OpenCL C programs without the use of physical addressing, there will always be programs that cannot be compiled without physical addressing.
clspv's pointer bitcast rewriting passes are rather complex and tend to require changes when large new code bases are compiled.
Some OpenCL CTS tests would be easy to pass with physical addressing (e.g. support for NULL kernel arguments).
Physical addressing opens the door to supporting OpenCL SVM or USM which in addition to being desirable features in their own right are also very useful to provide a bridge to enable other programming languages to target Vulkan.

Here's a rough outline of what I prototyped:

Use the SPV_KHR_physical_storage_buffer extension to enable the use of the PhysicalStorageBufferEXT storage class where StorageBuffer is currently used.
Disable pointer transformation passes (pointer bitcast replacement and simplification passes).
Introduce new runtime interfaces in the clspv non-semantic instruction set to allow passing pointers obtained by the runtime using VK_KHR_buffer_device_address via either push constants or uniform buffers. The interface uses a structure of 64-bit integers (or 2-element vectors of 32-bit integers), one for each pointer argument, that is part of the global push constant structure or stored in a dedicated uniform buffer. Pointer arguments to kernel functions are rewritten to load the required integers from this structure and convert them to pointers. A new pass is introduced to declare the interface structure and perform the necessary rewriting.
Add a new command line option to enable the use of physical addressing where possible (disabled by default)
Add a new command line option to select the 64-bit addressing model (disabled by default, implied by or required with physical addressing)

[ ] Support 64-bit addressing model (hard requirement of the PhysicalStorageBuffer64EXT addressing model)
- [x] Add libclc target (https://reviews.llvm.org/D116668)
- [x] Build libclc as part of clspv's build system
- [x] Add support for 64-bit addressing model https://github.com/google/clspv/pull/936
[x] Update clspv non-semantic instruction set with new runtime interface instructions
- [x] Specification (https://github.com/KhronosGroup/SPIRV-Registry/pull/153)
- [x] Headers (https://github.com/KhronosGroup/SPIRV-Headers/pull/284)
- [x] Use in clspv and other dependent projects (e.g. SPIR-V Tools)
[x] Introduce utilities to manage global push constants https://github.com/google/clspv/pull/882
[x] Add support for physical addressing (using 64-bit integers) https://github.com/google/clspv/pull/954
[ ] Add support for physical addressing (using 2-element vectors of 32-bit integers) https://github.com/google/clspv/issues/956

alan-baker commented 2 years ago

Seems reasonable overall. A few points to consider:

You can only disable pointer transforms for global (and sometimes constant) address space.
More detail on the reflection design would be appreciated. The push constant case seems clearer for reflection than the UBO case.
Nulls are tricky. You need to codegen as a conversion from integer 0 (or (0,0)) and then comparison.
I assume this is all based on a new option.
To be clear, this feature will wholesale replace uses of StorageBuffer with PhysicalStorageBuffer?

kpet commented 2 years ago

Thanks for the feedback!

Yes, my prototype has them disabled entirely but this only worked because I was looking at a very narrow set of workloads.
Vulkan only guarantees 128 bytes worth of push constants. At 8 byte per pointer, that's 4 kernels with 4 pointer arguments each, not even taking into account the push constant space used by PODs or other global push constants. I think it's safe to assume that cases where we can't entirely rely on push constants to pass pointers will be quite common. I'm proposing we use a UBO when we run out of push constant space (either as a complement to using push constants or instead of when there isn't enough space).
Thanks for the tip! Looking at my prototype I'm doing all comparisons after conversion to integers as pointer comparison are not allowed with operands pointing into the PhysicalStorageBufferEXT storage class.
Yup, so "obvious" I didn't say it :). I'll add a mention in the proposal.
Unless we find it's not possible, yes, that's currently the intent.

Cazadorro commented 2 years ago

FYI Nvidia originally had a limit on CUDA parameters of 256 bytes. This limit was increased to 4k in cuda 2.x by moving to constant memory AKA uniform memory/buffers.

I mention this to show there's industry precedent for the exact solution proposed for the push constant size limitations.