intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

Global register allocation failed & Not enough free registers while scratch-mapped registers #3

Closed zjin-lcf closed 3 years ago

zjin-lcf commented 3 years ago

Running a program (https://github.com/zjin-lcf/oneAPI-DirectProgramming/tree/master/sort-dpct) displays the following message (including information from cliloader - intel opencl intercept). There are three kernels in the program, and only one kernel's assembly is displayed (not shown here). Thank you for your solution.

./gpu_perfmon_read ~/oneAPI-Benchmarks/sort-dpct/main 3 10

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= CLIntercept (64-bit) is loading... CLintercept file location: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1 CLIntercept URL: https://github.com/intel/opencl-intercept-layer CLIntercept git description: v2.2.2-18-g204c386 CLIntercept git refspec: refs/heads/master CLInterecpt git hash: 204c386f6c9ccafeab839d5738c9fcde0ad05744 CLIntercept optional features: cliloader(supported) cliprof(supported) kernel overrides(supported) ITT tracing(NOT supported) MDAPI(supported) CLIntercept environment variable prefix: CLI_ CLIntercept config file: clintercept.conf Read OpenCL file name from user parameters: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1.2.real Trying to load dispatch from: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1.2.real Couldn't get exported function pointer to: clCreateBufferWithProperties Couldn't get exported function pointer to: clCreateImageWithProperties Couldn't get exported function pointer to: clSetContextDestructorCallback ... success! Timer Started! ... loading complete. Initializing host memory. Running benchmark with input array length 16777216 GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE122_20clES2_EUlNS0_7nd_itemILi3EEEE131_13: Not enough free registers while scratch-mapped registers (SREGs) are disabled GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE122_20clES2_EUlNS0_7nd_itemILi3EEEE131_13: Global register allocation failed GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE152_20clES2_EUlNS0_7nd_itemILi3EEEE167_13: Not enough free registers while scratch-mapped registers (SREGs) are disabled GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE152_20clES2_EUlNS0_7nd_itemILi3EEEE167_13: Global register allocation failed

anton-v-gorshkov commented 3 years ago

By default GT-Pin tries to utilize free registers while kernel profiling to store its intermediate measurements to reduce overhead and increase results accuracy. For some kernels it may be impossible due to high register pressure (kernel may utilize all the registers by its own). As a workaround, one may allow GT-Pin to use spill/fill mechanism to store data into device memory - but it may lead to visible overhead and less accurate data. To try this, just set "allow_sregs" option to "1" here: https://github.com/intel/pti-gpu/blob/master/samples/gpu_perfmon_read/gpu_perfmon_collector.h#L134