Open jssonx opened 1 month ago
@jmellorcrummey FYI
Hi,
Data Overlap: When collecting data for a kernel after its execution, is there a possibility that the data from zetMetricStreamerReadData includes stall samples from the previous kernel? My goal is to obtain non-overlapping stall samples for each kernel to enable fine-grained performance analysis.
From the API specification point of view, currently the only way to ensure this is to close and open the streamer. However this behaviour could be platform specific. For example On Aurora, If the previous kernel execution is completed (ensured using a HostSynchronize call) and all the stall data is read-out before the next kernel execution, then there should not be any overlaps in the stall data.
API Enhancement: If my understanding is correct, would it be possible to provide a levelzero API for flushing the metrics streamer, such as zetMetricStreamerFlushData? This could potentially be more efficient than the current zeroFlushStreamerBuffer implementation.
Yes. We are internally discussing the usefulness of such an API and having the use-case like you suggested would help finalize it.
Clarification: If my understanding is incorrect, could you please confirm that each call to zetMetricStreamerReadData always returns non-overlapping data? This would allow me to remove the zeroFlushStreamerBuffer function, potentially improving performance.
I think I have clarified this above. Please share if there are further clarifications.
Environment
Context
I'm developing a profiler for SYCL offload programs. My approach involves serializing kernel launches using
zeEventHostSynchronize
to ensure only one kernel is offloaded to the Intel GPU device at a time. For each kernel, I use a profiling thread to read stall sampling data usingzetMetricStreamerReadData
.Current Implementation
Currently, after each kernel execution, I collect and process the data. To ensure non-overlapping stall samples between kernels, I've implemented a manual buffer flushing function
zeroFlushStreamerBuffer(streamer, desc)
. This function closes the current streamer and opens a new one.Current Implementation Details
To provide more context, here's the main profiling loop where
zeroFlushStreamerBuffer
is used:This code demonstrates how we currently handle metric collection for each kernel execution, including the use of
zeroFlushStreamerBuffer
to attempt non-overlapping data collection between kernels.Questions
Data Overlap: When collecting data for a kernel after its execution, is there a possibility that the data from
zetMetricStreamerReadData
includes stall samples from the previous kernel? My goal is to obtain non-overlapping stall samples for each kernel to enable fine-grained performance analysis.API Enhancement: If my understanding is correct, would it be possible to provide a levelzero API for flushing the metrics streamer, such as
zetMetricStreamerFlushData
? This could potentially be more efficient than the currentzeroFlushStreamerBuffer
implementation.Clarification: If my understanding is incorrect, could you please confirm that each call to
zetMetricStreamerReadData
always returns non-overlapping data? This would allow me to remove thezeroFlushStreamerBuffer
function, potentially improving performance.Request
I would greatly appreciate clarification on the behavior of
zetMetricStreamerReadData
in this context and any guidance on the best practices for ensuring non-overlapping metric collection between kernel executions.