Closed Thyre closed 11 months ago
Hello @Thyre --- good catch on the missing bytes_copied.
On Level0 part (mem allocs/deletes) can you expand a bit on what you mean on the missing portion ?
Sure! Looking at pti_view.h
, we can see that there's no record type for allocations / deletions. This is also the case for OMPT for example, where no explicit callback / record for memory handling exists. However, the ompt_callback_target_data_op
is reused for this, having optype
values for allocations and deletions.
Since pti_view_kind
also has a value for Level Zero with PTI_VIEW_LEVEL_ZERO_CALLS
, my guess would be that allocations / deletions might get included there, since functions like zeMemAllocShared
represent memory handling.
However, PTI_VIEW_LEVEL_ZERO_CALLS
is not implemented yet, resulting in PTI_ERROR_NOT_IMPLEMENTED
if one tries to register this group.
https://github.com/intel/pti-gpu/blob/90b9230c8bd9c00211934ec5e36183edc3aa8c1d/sdk/src/view_handler.h#L244
So my plan would be to wait until a first proof of concept of that part is implemented as well :smile:
If it's still missing then, or the implementation is not easy to work with, I can give additional feedback. I also asked colleagues to maybe have a look at the PTI-SDK PoC.
Hi @Thyre , number of bytes copied added https://github.com/intel/pti-gpu/blob/f27829883defd69c18e7dc2e4cdd727a2483c40c/sdk/include/pti_view.h#L188
Although PTI_VIEW_LEVEL_ZERO_CALLS is still to be done. It is on our list. thank you!
This looks good, thanks :smile:
Since this issue focuses on the _bytes
field only, I would close the issue if you're fine with it.
For PTI_VIEW_LEVEL_ZERO_CALLS
, a separate issue can be opened if needed.
I'm also trying to build an extremely early prototype during spare cycles, so that I can give additional feedback what we might need. However, I expect that it will take quite some time to get a first prototype.
Thank you! yes - it would be good to close this issue. And open on PTI_VIEW_LEVEL_ZERO_CALLS
.
last month was a lot about infrastructure work and publishing new tool - unitrace
, we will invest more into the SDK features going forward.
Description
Recently, a proof of concept version for the PTI-SDK was added to this repository. I've had a look at it and tried to implement some basic handling of the interface to check if it is usable for us right now.
Right now, the field
pti_view_record_memory_copy
is used to describe copy actions between different types of memory (memory to device, shared memory to device etc.). The struct itself has many descriptive fields. However, I'm noticing that one crucial one is missing: The amount of memory we're actually copying!Lets look at the struct:
There's certainly some stuff not needed here (like PCI address and so on, which, if needed, could be returned by a separate function), but this is noted in the TODO already. The amount of memory is certainly crucial for some applications to potentially show bottlenecks. Just showing the time passed is not sufficient and having a separate record / callback for that would also be inconvenient.
Just as an addition, memory allocations and deletions are also not shown, but I guess this is because the Level0 part of the PTI-SDK is still missing.