google / clspv

Clspv is a compiler for OpenCL C to Vulkan compute shaders
Apache License 2.0
602 stars 87 forks source link

Support for physical addressing #826

Open kpet opened 2 years ago

kpet commented 2 years ago

Motivation

Proposed design outline

Here's a rough outline of what I prototyped:

Proposed staging

alan-baker commented 2 years ago

Seems reasonable overall. A few points to consider:

  1. You can only disable pointer transforms for global (and sometimes constant) address space.
  2. More detail on the reflection design would be appreciated. The push constant case seems clearer for reflection than the UBO case.
  3. Nulls are tricky. You need to codegen as a conversion from integer 0 (or (0,0)) and then comparison.
  4. I assume this is all based on a new option.
  5. To be clear, this feature will wholesale replace uses of StorageBuffer with PhysicalStorageBuffer?
kpet commented 2 years ago

Thanks for the feedback!

  1. Yes, my prototype has them disabled entirely but this only worked because I was looking at a very narrow set of workloads.
  2. Vulkan only guarantees 128 bytes worth of push constants. At 8 byte per pointer, that's 4 kernels with 4 pointer arguments each, not even taking into account the push constant space used by PODs or other global push constants. I think it's safe to assume that cases where we can't entirely rely on push constants to pass pointers will be quite common. I'm proposing we use a UBO when we run out of push constant space (either as a complement to using push constants or instead of when there isn't enough space).
  3. Thanks for the tip! Looking at my prototype I'm doing all comparisons after conversion to integers as pointer comparison are not allowed with operands pointing into the PhysicalStorageBufferEXT storage class.
  4. Yup, so "obvious" I didn't say it :). I'll add a mention in the proposal.
  5. Unless we find it's not possible, yes, that's currently the intent.
Cazadorro commented 2 years ago

FYI Nvidia originally had a limit on CUDA parameters of 256 bytes. This limit was increased to 4k in cuda 2.x by moving to constant memory AKA uniform memory/buffers.

I mention this to show there's industry precedent for the exact solution proposed for the push constant size limitations.