intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.22k stars 732 forks source link

[SYCL] __grid_constant__ #7998

Closed zjin-lcf closed 1 week ago

zjin-lcf commented 1 year ago

Is this an optimization feature for SYCL ?

If the address of a __global__ function parameter is taken, the compiler will ordinarily make a copy of the kernel parameter in thread local memory and use the address of the copy, to partially support C++ semantics, which allow each thread to modify its own local copy of function parameters. Annotating a __global__ function parameter with __grid_constant__ ensures that the compiler will not create a copy of the kernel parameter in thread local memory, but will instead use the generic address of the parameter itself. Avoiding the local copy may result in improved performance.

#include <cuda.h>

struct S {
  char4 a;
  int x;
};

__device__ void unknown_function(S const&) {}

__global__ void kernel(const __grid_constant__ S s) {
   // s.x += threadIdx.x;  // Undefined Behavior: tried to modify read-only memory

   // Compiler will _not_ create a per-thread thread local copy of "s":
   unknown_function(s);
}

int main() {
  S car;
  kernel<<<1,1>>>(car);
  return 0;
}

Reference https://docs.nvidia.com/cuda/cuda-c-programming-guide/#grid-constant

jchlanda commented 1 year ago

The upstream seems to be somewhat sceptical about __grid_constant and the applicability of it: https://discourse.llvm.org/t/rfc-hip-grid-constant-support/67759 I've pinged that thread, it feels like it really belongs in the upstream, once it's done there, we could add support in SYCL quite easily.

zjin-lcf commented 10 months ago

@jchlanda

https://github.com/NVIDIA/cuda-samples/blob/master/Samples/6_Performance/LargeKernelParameter/LargeKernelParameter.cu

zjin-lcf commented 10 months ago

There seems no performance improvement whether "__grid_constant__" is added or not.

frasercrmck commented 2 months ago

As of https://github.com/intel/llvm/pull/14332 (just merged), we implicitly add the grid_constant annotation to all by-value structure-type kernel parameters.

This will eventually have the same effect as manually decorating these parameters with __grid_constant__ in CUDA but we don't give the user explicit control over it.

I say "eventually" as the way we currently lower SYCL functors in the front-end means this information is ultimately lost. We have some in-progress work to address this: see https://github.com/intel/llvm/pull/14434.