intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
MIT License
202 stars 57 forks source link

[PTI-SDK] Buffer event timestamp conversion #46

Open Thyre opened 1 year ago

Thyre commented 1 year ago

Heya,

I've noticed that the repository recently added an initial draft for a SDK which can be used for profiling / tracing tools to more easily add support for Intel GPUs to their applications.

I installed the current version on my system (Ubuntu 22.04, Intel Core i7-1260P) which was working mostly fine, though I ran into some issues with xtpi because oneAPI is installed as a module on my system which wasn't found by CMake.

Skimming through the headers and available methods, the interface looks fine, though I would need to implement it into a tool to check if it fits my requirements. However, I noticed one thing already: Right now, I don't see a way to convert timestamps given by the PTI-SDK.


Timestamp conversion

As far as I can see, PTI-SDK uses nanosecond resolution timers to collect its events. That's perfect, since some operations will take a very small amount of time to complete. However, UNIX systems might not only offer a single timer, but several ones to choose from. This option might be available to the user and will only change timers used by the application itself, with PTI-SDK still delivering the same timestamps.

For pure calculations of the computing time of an action, this is fine. However, more detailed analysis of program executions might rely on comparing timestamps between host and device activities. Here, the current implementation of PTI-SDK will fail. This is just an example, there are more reasons for timestamp conversion for example related to output formats.

Other interfaces show similar issues. OpenMP for example does have a translate_time function in their specifications. However, the implementation in ROCm 5.7.1 translates those timestamps to seconds, making them useless for meaningful analysis. CUDA also didn't have a native way to translate timestamps when using CUPTI until CUDA 11.6, where a direct callback was introduced and tools could register their timestamp function via cuptiActivityRegisterTimestampCallback. For those interfaces, timestamp conversion had to be done manually, by acquiring timestamps at least twice during program execution and calculating a conversion rate.

For PTI-SDK, there are additional hindrances for this approach though. Since we (seemingly) do not get events outside of buffer requests and buffer completions at this point and also do not have a function to get the timestamp, like cuptiGetTimestamp or get_device_time from OMPT, in PTI-SDK itself, there's no real way to convert timestamps at all. I'm not familiar enough with Level0 if there's a way to acquire timestamps that way, but having a direct way though PTI-SDK would be preferred.


Proposal

There are two ways to solve this issue. Either add a function to get the current timestamp used inside PTI-SDK, for example via

uint64_t PTI_EXPORT  
pti[prefix]GetTimestamp()

or add the option to use tool defined timestamps via a callback function, like CUPTI uses already (see here)

mschilling0 commented 1 year ago

Hi @Thyre, thank you for your issue and your high quality and informative proposal. I am glad you have already had a chance to look into PTI-SDK.

I bought it to the attention of our team.

installed the current version on my system (Ubuntu 22.04, Intel Core i7-1260P) which was working mostly fine, though I ran into some issues with xtpi because oneAPI is installed as a module on my system which wasn't found by CMake.

I am also using Ubuntu 22.04, so I am looking into your CMake issue. I will follow up with a comment on how I got it to work or if a fix is required.

Thyre commented 1 year ago

Thanks for forwarding it :smile: My temporary workaround was to set CMPLR_ROOT manually via

$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/

I guess this variable is not set in my Lmod setup, but seems to get set when the setvars.sh script of oneAPI is sourced.

With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.

mschilling0 commented 1 year ago

Thanks for forwarding it 😄 My temporary workaround was to set CMPLR_ROOT manually via

$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/

I guess this variable is not set in my Lmod setup, but seems to get set when the setvars.sh script of oneAPI is sourced.

With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.

Yeah it should be set with the OneAPI variables. I was not able to re-create your issue, even with mod files, I wonder if it is related to this.

If you installed Intel compilers as part of the oneAPI 2023.2 release of the Intel® oneAPI Base Toolkit, the Intel® oneAPI HPC Toolkit, the Intel® oneAPI IoT Toolkit, or from the oneAPI Standalone Component page, please install the appropriate patch for your environment.

Two patches are now available, one for each of the Intel C++ and Fortran compilers, that were published as part of oneAPI 2023.2:

  • Intel® oneAPI DPC+/C+ Compiler and Intel® C++ Compiler Classic
  • Intel® Fortran Compiler Classic and Intel® Fortran Compiler

The patch version is 2023.2.1.

These patches apply only to Linux and Windows.

These patches resolve the issue of missing Environment Modules utility modulefiles and other issues.

Pure speculation, but if you have not updated your version of the compiler I would suggest it.

Anyway, I was able to build with unset CMPLR_ROOT. Feel free to test whether this patch works for you: https://github.com/intel/pti-gpu/commit/4b0c7eecd0a4f7dab3c31991d94a0c92e0254045.

Thyre commented 11 months ago

Thanks for forwarding it 😄 My temporary workaround was to set CMPLR_ROOT manually via

$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/

I guess this variable is not set in my Lmod setup, but seems to get set when the setvars.sh script of oneAPI is sourced. With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.

Yeah it should be set with the OneAPI variables. I was not able to re-create your issue, even with mod files, I wonder if it is related to this.

If you installed Intel compilers as part of the oneAPI 2023.2 release of the Intel® oneAPI Base Toolkit, the Intel® oneAPI HPC Toolkit, the Intel® oneAPI IoT Toolkit, or from the oneAPI Standalone Component page, please install the appropriate patch for your environment. Two patches are now available, one for each of the Intel C++ and Fortran compilers, that were published as part of oneAPI 2023.2:

  • Intel® oneAPI DPC+/C+ Compiler and Intel® C++ Compiler Classic
  • Intel® Fortran Compiler Classic and Intel® Fortran Compiler

The patch version is 2023.2.1. These patches apply only to Linux and Windows. These patches resolve the issue of missing Environment Modules utility modulefiles and other issues.

Pure speculation, but if you have not updated your version of the compiler I would suggest it.

Anyway, I was able to build with unset CMPLR_ROOT. Feel free to test whether this patch works for you: 4b0c7ee.

Sorry for the late response. Your patch seems to fix my CMake issues, thanks :smile: