intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.25k stars 738 forks source link

sycl_ext_intel_grf_size large grf not working for sycl_explicit_simd kernel #12704

Open elliottbinder opened 9 months ago

elliottbinder commented 9 months ago

Describe the bug Setting the kernel_properties to include grf_size<256> for an ESIMD kernel does not change the register file size.

To Reproduce

#include <CL/sycl.hpp>
#include <sycl/ext/intel/esimd.hpp>
#include <sycl/ext/intel/experimental/grf_size_properties.hpp>
int main() {
    printf("SYCL_EXT_INTEL_GRF_SIZE: %d\n", SYCL_EXT_INTEL_GRF_SIZE);
    sycl::queue q(sycl::gpu_selector_v);
    sycl::ext::oneapi::experimental::properties kernel_properties{sycl::ext::intel::experimental::grf_size<256>};
    q.submit([&](sycl::handler &h) {
      h.parallel_for(
        sycl::nd_range{
          sycl::range<3>(1, 1, 1),
          sycl::range<3>(1, 1, 1)
        },
        kernel_properties,
        [=](sycl::nd_item<3> item)
          [[intel::sycl_explicit_simd]]
        {
        }
      );
    }).wait();
    return 0;
}

Save file as no_large_grf_reproducer.cpp

Export environment variables to keep assembly code

export IGC_ShaderDumpEnable=1
export IGC_DumpToCustomDir=assembly

Compile with

icpx -std=c++20 -fsycl -fsycl-targets=intel_gpu_acm_g10 -Xsycl-target-frontend -O3 -o no_large_grf_reproducer src/no_large_grf_reproducer.cpp -lsycl

Look at the assembly code (.asm file in the assembly directory) and note the .thread_config numGRF value. This should be 256. This value is correctly set when not using [[intel::sycl_explicit_simd]]. It appears to also not work correctly when targeting intel_gpu_pvc.

Environment (please complete the following information):

KornevNikita commented 8 months ago

@sarnex could you take a look?

sarnex commented 8 months ago

Yes sure

sarnex commented 8 months ago

Unfortunately this is a known limitation with the IGC vector compiler (used by ESIMD) and AOT mode. Per-kernel specification is not currently supported. There is already an open internal ticket for IGC tracking this limitation. If you have to use AOT mode, per-kernel GRF specification won't work, so I recommend using the IGC option to set auto GRF mode for all kernels, which should pick large GRF for kernels where it will have a benefit and leave others with small GRF mode. The IGC option is -ze-intel-enable-auto-large-GRF-mode. You can also force large mode for all kernels with -ze-opt-large-register-file. These will go inside your -Xsycl-backend string.