Closed zjin-lcf closed 3 years ago
By default GT-Pin tries to utilize free registers while kernel profiling to store its intermediate measurements to reduce overhead and increase results accuracy. For some kernels it may be impossible due to high register pressure (kernel may utilize all the registers by its own). As a workaround, one may allow GT-Pin to use spill/fill mechanism to store data into device memory - but it may lead to visible overhead and less accurate data. To try this, just set "allow_sregs" option to "1" here: https://github.com/intel/pti-gpu/blob/master/samples/gpu_perfmon_read/gpu_perfmon_collector.h#L134
Running a program (https://github.com/zjin-lcf/oneAPI-DirectProgramming/tree/master/sort-dpct) displays the following message (including information from cliloader - intel opencl intercept). There are three kernels in the program, and only one kernel's assembly is displayed (not shown here). Thank you for your solution.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= CLIntercept (64-bit) is loading... CLintercept file location: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1 CLIntercept URL: https://github.com/intel/opencl-intercept-layer CLIntercept git description: v2.2.2-18-g204c386 CLIntercept git refspec: refs/heads/master CLInterecpt git hash: 204c386f6c9ccafeab839d5738c9fcde0ad05744 CLIntercept optional features: cliloader(supported) cliprof(supported) kernel overrides(supported) ITT tracing(NOT supported) MDAPI(supported) CLIntercept environment variable prefix: CLI_ CLIntercept config file: clintercept.conf Read OpenCL file name from user parameters: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1.2.real Trying to load dispatch from: /opt/intel/oneapi/compiler/latest/linux/lib/libOpenCL.so.1.2.real Couldn't get exported function pointer to: clCreateBufferWithProperties Couldn't get exported function pointer to: clCreateImageWithProperties Couldn't get exported function pointer to: clSetContextDestructorCallback ... success! Timer Started! ... loading complete. Initializing host memory. Running benchmark with input array length 16777216 GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE122_20clES2_EUlNS0_7nd_itemILi3EEEE131_13: Not enough free registers while scratch-mapped registers (SREGs) are disabled GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE122_20clES2_EUlNS0_7nd_itemILi3EEEE131_13: Global register allocation failed GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE152_20clES2_EUlNS0_7nd_itemILi3EEEE167_13: Not enough free registers while scratch-mapped registers (SREGs) are disabled GTPIN WARNING (PID 21552): _ZTSZZ4mainENKUlRN2cl4sycl7handlerEE152_20clES2_EUlNS0_7nd_itemILi3EEEE167_13: Global register allocation failed