Closed ytzhang1 closed 4 months ago
@ytzhang1, Could you please share me your sycl-graph-app code? I need the source to build it locally and test it with Unitrace. I am working on enabling Sycl-graph on Unitrace.
Hi @Sarbojit2019 , The following is a test code I have been using, Thanks!
#include <iostream>
#include <sycl/sycl.hpp>
void run1(sycl::queue &q, float* dst, float* src, float* tmp1, float* tmp2, int count)
{
sycl::event ek1 = q.submit([&](sycl::handler &h) {
h.parallel_for(count, [=](sycl::item<1> item) {
int idx = item.get_id(0);
tmp1[idx] = src[idx] * 2;
});
});
sycl::event ek2 = q.submit([&](sycl::handler &h) {
h.parallel_for(count, [=](sycl::item<1> item) {
int idx = item.get_id(0);
tmp2[idx] = tmp1[idx] * 3;
});
});
sycl::event ek3 = q.submit([&](sycl::handler &h) {
h.parallel_for(count, [=](sycl::item<1> item) {
int idx = item.get_id(0);
dst[idx] = tmp2[idx] + 11;
});
});
}
int test0()
{
sycl::queue q{sycl::gpu_selector_v, {sycl::property::queue::in_order(),
sycl::ext::intel::property::queue::no_immediate_command_list()}};
int count = 1024 * 1024;
float *inp = sycl::malloc_device<float>(count, q);
float *outp = sycl::malloc_device<float>(count, q);
float *tmp1 = sycl::malloc_device<float>(count, q);
float *tmp2 = sycl::malloc_device<float>(count, q);
float *inp_h = new float[count];
float *outp_h = new float[count];
for (size_t i = 0; i < count; ++i) {
inp_h[i] = i/4;
outp_h[i] = -1;
}
q.memcpy(inp, inp_h, count * sizeof(float)).wait();
// record graph
sycl::ext::oneapi::experimental::command_graph g {q.get_context(), q.get_device()};
g.begin_recording(q);
run1(q, outp, inp, tmp1, tmp2, count);
g.end_recording();
auto execGraph = g.finalize();
q.ext_oneapi_graph(execGraph).wait();
q.memcpy(outp_h, outp, count * sizeof(float)).wait();
std::cout << "test finished." << std::endl;
return 0;
}
int main(int argc, char** argv)
{
test0();
}
@ytzhang1 @Sarbojit2019 This is what I got with latest unitrace cloned from the repo (@Sarbojit2019 this does not have what you have been working on. Just the organic mainline in the repo):
All the kernels are there.
@ytzhang1 Please give it a try,
Hi @zma2, thanks for pointing that out! I verified that with the latest pti-gpu, it can show the kernel executions! thanks! I'll close this issue.
SYCL graph is an experimental feature released in oneapi 2024.0, and it seems that unitrace or onetrace doesn't support tracing of sycl graph kernels. In the following image, the device activities only show 3 zeCommandListAppendBarrier.
I used the following command to trace the binary, unitrace --demangle --chrome-device-activities --chrom-kernel-activities ./sycl-graph-app It'll be great if unitrace can trace sycl graph either in graph granularity or kernel granularity.