Open Thyre opened 1 year ago
Hi @Thyre, thank you for your issue and your high quality and informative proposal. I am glad you have already had a chance to look into PTI-SDK.
I bought it to the attention of our team.
installed the current version on my system (Ubuntu 22.04, Intel Core i7-1260P) which was working mostly fine, though I ran into some issues with xtpi because oneAPI is installed as a module on my system which wasn't found by CMake.
I am also using Ubuntu 22.04, so I am looking into your CMake issue. I will follow up with a comment on how I got it to work or if a fix is required.
Thanks for forwarding it :smile:
My temporary workaround was to set CMPLR_ROOT
manually via
$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/
I guess this variable is not set in my Lmod setup, but seems to get set when the setvars.sh
script of oneAPI is sourced.
With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.
Thanks for forwarding it 😄 My temporary workaround was to set
CMPLR_ROOT
manually via$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/
I guess this variable is not set in my Lmod setup, but seems to get set when the
setvars.sh
script of oneAPI is sourced.With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.
Yeah it should be set with the OneAPI variables. I was not able to re-create your issue, even with mod files, I wonder if it is related to this.
If you installed Intel compilers as part of the oneAPI 2023.2 release of the Intel® oneAPI Base Toolkit, the Intel® oneAPI HPC Toolkit, the Intel® oneAPI IoT Toolkit, or from the oneAPI Standalone Component page, please install the appropriate patch for your environment.
Two patches are now available, one for each of the Intel C++ and Fortran compilers, that were published as part of oneAPI 2023.2:
- Intel® oneAPI DPC+/C+ Compiler and Intel® C++ Compiler Classic
- Intel® Fortran Compiler Classic and Intel® Fortran Compiler
The patch version is 2023.2.1.
These patches apply only to Linux and Windows.
These patches resolve the issue of missing Environment Modules utility modulefiles and other issues.
Pure speculation, but if you have not updated your version of the compiler I would suggest it.
Anyway, I was able to build with unset CMPLR_ROOT
. Feel free to test whether this patch works for you: https://github.com/intel/pti-gpu/commit/4b0c7eecd0a4f7dab3c31991d94a0c92e0254045.
Thanks for forwarding it 😄 My temporary workaround was to set
CMPLR_ROOT
manually via$ export CMPLR_ROOT=/opt/software/software/Intel/2023.2.0/compiler/latest/
I guess this variable is not set in my Lmod setup, but seems to get set when the
setvars.sh
script of oneAPI is sourced. With that, I was able to compile and run most of the samples. Some failed, which was somewhat expected since the iGPU seems to not support f64 for SYCL.Yeah it should be set with the OneAPI variables. I was not able to re-create your issue, even with mod files, I wonder if it is related to this.
If you installed Intel compilers as part of the oneAPI 2023.2 release of the Intel® oneAPI Base Toolkit, the Intel® oneAPI HPC Toolkit, the Intel® oneAPI IoT Toolkit, or from the oneAPI Standalone Component page, please install the appropriate patch for your environment. Two patches are now available, one for each of the Intel C++ and Fortran compilers, that were published as part of oneAPI 2023.2:
- Intel® oneAPI DPC+/C+ Compiler and Intel® C++ Compiler Classic
- Intel® Fortran Compiler Classic and Intel® Fortran Compiler
The patch version is 2023.2.1. These patches apply only to Linux and Windows. These patches resolve the issue of missing Environment Modules utility modulefiles and other issues.
Pure speculation, but if you have not updated your version of the compiler I would suggest it.
Anyway, I was able to build with
unset CMPLR_ROOT
. Feel free to test whether this patch works for you: 4b0c7ee.
Sorry for the late response. Your patch seems to fix my CMake issues, thanks :smile:
Heya,
I've noticed that the repository recently added an initial draft for a SDK which can be used for profiling / tracing tools to more easily add support for Intel GPUs to their applications.
I installed the current version on my system (Ubuntu 22.04, Intel Core i7-1260P) which was working mostly fine, though I ran into some issues with
xtpi
because oneAPI is installed as a module on my system which wasn't found by CMake.Skimming through the headers and available methods, the interface looks fine, though I would need to implement it into a tool to check if it fits my requirements. However, I noticed one thing already: Right now, I don't see a way to convert timestamps given by the PTI-SDK.
Timestamp conversion
As far as I can see, PTI-SDK uses nanosecond resolution timers to collect its events. That's perfect, since some operations will take a very small amount of time to complete. However, UNIX systems might not only offer a single timer, but several ones to choose from. This option might be available to the user and will only change timers used by the application itself, with PTI-SDK still delivering the same timestamps.
For pure calculations of the computing time of an action, this is fine. However, more detailed analysis of program executions might rely on comparing timestamps between host and device activities. Here, the current implementation of PTI-SDK will fail. This is just an example, there are more reasons for timestamp conversion for example related to output formats.
Other interfaces show similar issues. OpenMP for example does have a
translate_time
function in their specifications. However, the implementation in ROCm 5.7.1 translates those timestamps to seconds, making them useless for meaningful analysis. CUDA also didn't have a native way to translate timestamps when using CUPTI until CUDA 11.6, where a direct callback was introduced and tools could register their timestamp function viacuptiActivityRegisterTimestampCallback
. For those interfaces, timestamp conversion had to be done manually, by acquiring timestamps at least twice during program execution and calculating a conversion rate.For PTI-SDK, there are additional hindrances for this approach though. Since we (seemingly) do not get events outside of buffer requests and buffer completions at this point and also do not have a function to get the timestamp, like
cuptiGetTimestamp
orget_device_time
from OMPT, in PTI-SDK itself, there's no real way to convert timestamps at all. I'm not familiar enough with Level0 if there's a way to acquire timestamps that way, but having a direct way though PTI-SDK would be preferred.Proposal
There are two ways to solve this issue. Either add a function to get the current timestamp used inside PTI-SDK, for example via
or add the option to use tool defined timestamps via a callback function, like CUPTI uses already (see here)