Open jfedorov opened 4 months ago
let me summarize understanding :
In the driver we would need to allow profiling in such conditions and apply profiling capture for fs1 this would mean we will utilize one more post sync operation for pre-fs1 this would mean we would need to add additional profiling calls around the next call
We gain with this API that we do not need to introduce any new APIs.
Yes. You summarized correctly.
Just to make sure:
This API is to be used for any CommandList - not just for Immediate.
For example, this API could be also used to "instrument" commandLists
passed to zeCommandQueueExecuteCommandLists
Summary
Introduce Level-Zero Core or Tools API that enables setting-up timestamp enabled event (additional to already set) for GPU task that being submitted into the command list.
Details
Motivation
Level-Zero and Loader already added APIs that makes Start/Stop possible. But current Start/Stop implementation by the tracing tool (e.g. PTI) incurs significant overhead as there are no means to easy add or change an Event to the one having Timestamp property.
To meet requirement (2) - let's add an API that would add profiling event (event with Timestamp enabled) on the fly - prior to the zeCommandListAppendLaunchKernel (zeCommandListAppendMemoryCopy etc.)
Proposed API
New Functions
From the brainstorm this API could be something like:
The usage flow would be like this:
Call
zeCommandListProfileNextAppend(cmdList, event)
priorzeCommandListAppend..(cmdList,..)
that submits a task to be profiled. The caller of these 2 APIs should make additional precautions in the situation when 2 threads might submit to the same command list around the same moment. So the situation when the event from zeCommandListProfileNextAppend might be erroneously associated with zeCommandListAppend... from another thread is to be handled by an API user.Event passed into
zeCommandListProfileNextAppend
is to be created by a user and should come from eventPool withZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP
. The timing data from the event would be availible per the task completion of the devcie and should be retrieved byzeEventQueryKernelTimestamp(
event, ×tamp);