oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

[Question] Documentation missing: migrating OpenCL profiling code to Level Zero" #66

Open jjfumero opened 3 years ago

jjfumero commented 3 years ago

Hi all, I wonder if there is any call in LevelZero API equivalent to clGetEventProfilingInfo from OpenCL: https://www.khronos.org/registry/OpenCL/sdk/2.1/docs/man/xhtml/clGetEventProfilingInfo.html

What I would like to do is to measure the data transfer time (zeCommandListAppendMemoryCopy, Host -> Device and Device->Host. Is there any functionality in level zero to do so?

I saw the kernel timers use a different function zeCommandListAppendQueryKernelTimestamps but I can't find the equivalent for data transfers.

Any pointers/examples will appreciate it.

Thank you, Juan

jandres742 commented 3 years ago

hi @jjfumero

but I can't find the equivalent for data transfers.

are you saying that clGetEventProfilingInfo is the one OpenCL uses for data transfers?

Also, could you elaborate on what you think it is missing from zeCommandListAppendQueryKernelTimestamps to be able to be used for data transfers?

jjfumero commented 3 years ago

Sorry, I did not explain well.

I meant to do something similar to this:

cl_ulong time_start, time_end;
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
cl_ulong elapsedtime =  (time_end - time_start);

What I meant with the relation with the kernel time is that, in OpenCL, it is the same function clGetEventProfilingInfo to obtain the data transfers and the kernel elapsed timers (because is based on an event). In LevelZero, I see there is a different function to query this metric by using zeCommandListAppendQueryKernelTimestamps and I wonder how to obtain the data transfer elapsed time.

jandres742 commented 3 years ago

@jjfumero have you taken a look at: https://github.com/intel/compute-runtime/blob/6c66c4ab10aed5e42a14d0a8c2d57eb416d0792b/level_zero/core/test/black_box_tests/zello_timestamp.cpp#L226

that shows the example with an appendKernel, but it would be the same for an appendCopy

jjfumero commented 3 years ago

Thank you @jandres742 , this is very helpful.

It looks like this is another way:

https://github.com/intel/compute-runtime/blob/6c66c4ab10aed5e42a14d0a8c2d57eb416d0792b/level_zero/core/test/black_box_tests/zello_timestamp.cpp#L113-L123

jjfumero commented 3 years ago

I confirm this strategy works for me.

Another question, is there any function equivalent to query CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT from OpenCL?

I guess commands in a command list in LevelZero are not executed until the list is closed and the command queue is executed. How can I query this time (elapsed time in which a command is enqueued in the list until it is starting to execute)?

Any pointer/reference will appreciate it. Juan

eero-t commented 2 years ago

@jjfumero You could probably open a new issue about documentation missing for migrating specific OpenCL functionality to Level Zero. Or maybe you could just retitle this as "Documentation missing: migrating OpenCL profiling code to Level Zero"... :-)