wdamon-intel commented 9 months ago

Summary

Introduce an experimental extension to clone a command list.

Details

Motivation

In order to efficiently support certain use-cases, such as compute graphs, it is desirable to have the ability to clone a command list that has been closed.

Interoperability with Other Extensions

This extension is complimentary to and may be used in conjunction with ZE_experimental_mutable_command_list

Proposed API

New Flags

ZE_COMMAND_LIST_FLAG_EXP_CLONEABLE

A new flag is introduced to ze_command_list_flags_t to inform the implementation that the application intends to create a clone of a command list.

New Functions

zeCommandListCreateCloneExp

Creates a command list as the clone of another command list.

ze_result_t zeCommandListCreateCloneExp(
    ze_command_list_handle_t hCommandList,
    ze_command_list_handle_t* phClonedCommandList
);

Parameter	Description
hCommandList	[ ${\textsf{\color{orange}in}}$ ] handle to source command list (the command list to clone)
phClonedCommandList	[ ${\textsf{\color{yellow}out}}$ ] handle of the cloned command list

Notes

immediate command lists may not be cloned
the source command list referenced by the hCommandList parameter must be
- created with the ZE_COMMAND_LIST_FLAG_EXP_CLONEABLE flag, and
- closed prior to cloning
the source command list may be cloned while it is running on the device
the cloned command list inherits all properties of the source command list
the cloned command list must be destroyed prior to the source command list
the application must only use the command list for the device, or its sub-devices, which was provided during creation

Example Usage

Creating a Cloneable Command List

// Create a command list that may be cloned
ze_command_list_desc_t commandListDesc = {
    ZE_STRUCTURE_TYPE_COMMAND_LIST_DESC,
    nullptr,
    0,
    ze_COMMAND_LIST_FLAG_EXP_CLONEABLE
};
ze_command_list_handle_t hCommandList = nullptr;
zeCommandListCreate(hContext, hDevice, &commandListDesc, &hCommandList);

// { ...[construct command list]... }

// Close the command list
zeCommandListClose(hCommandList);

// Execute the command list
zeCommandQueueExecuteCommandLists(hCommandQueue, 1, &hCommandList, nullptr);

// Clone the command list, no synchronization required
ze_command_list_handle_t hClonedCommandList = nullptr;
zeCommandListCreateCloneExp(hCommandList, &hClonedCommandList);

// ...

MichalMrozek commented 9 months ago

can we add to notes that we may clone command list that is currently running ? lgtm

zzdanowicz commented 9 months ago

question: does clone is also cloneable - I assume yes, but do we need this clarification? or maybe cloned command list inherits all properties of cloneable.

wdamon-intel commented 9 months ago

can we add to notes that we may clone command list that is currently running?

Added to the Notes

wdamon-intel commented 9 months ago

question: does clone is also cloneable - I assume yes, but do we need this clarification? or maybe cloned command list inherits all properties of cloneable.

Updated the Notes section to state that specifically the cloned command list inherits all the properties of the source command list.

guoyejun commented 9 months ago

Will all (or most) of the kernels of a compute graph be in one command list? If the answer is yes, for the case that we only need to change one arg of one kernel, what is the overhead of the clone of command list (containing many kernels)?

zzdanowicz commented 9 months ago

@guoyejun - it depends on the size of the command list. Number of kernels in it, types of kernels (number of arguments per kernel, if kernels have special properties), consumed internal heaps and command buffers. And it really depends on the applications how do they want perform computations. Sure, you can use single command list and mutate arguments, but application must synchronize the completion of execution before updating or executing command list again. And if you would like to execute same algorithm with different inputs in parallel, then clone might be viable solution, done once upfront.

guoyejun commented 9 months ago

let's focus on the clone method due to the force sync in mutate method.

It is expected that there would be many kernels in a command list (hundreds, and thousands in trend).

Could you share which parts will be cloned? For example, if GPU commands is cloned, if internal heaps is newly allocated, etc. I want to get a basic impression if the overhead is small or large.

One possible use case is that only the input buffer (USM) is changed for the first kernel in the command list, assume the extreme case that the pointer is changing at every iteration, will we easily go into device memory OOM due to the clone of command list? I understand it depends on many factors, but is there a very rough estimation?

zzdanowicz commented 9 months ago

Clone should have its own heap and command buffer allocations, so cost of cloning should be the proportional to the cost of creating and recording command list. Sure, driver would not be required to encode all commands, indirect data, but rather copy them, so the expectation is to have cloning time proportional to the creation and recording.

MichalMrozek commented 9 months ago

It would also depend on how much would be requested to be mutable. For immutable stuff, we can re-use the parent command list read-only structures/heaps. That's why we have this clause that parent command list must be present in order for clones to work.

guoyejun commented 9 months ago

that's nice that we can re-use the immutable stuff.

just curious how we can re-use it? Per my understanding, the command buffer is one batch buffer, how can we reuse something in the original batch buffer in the cloned batch buffer?

MichalMrozek commented 9 months ago

Command buffer is one of resources, there are others like heaps which stores constants. If the kernel is immutable, then we may use the same heap as with parent command list , there is no need to allocate new one.

oneapi-src / level-zero-spec

Experimental extension to clone a command list #259

Summary

Details

Motivation

Interoperability with Other Extensions

Proposed API

New Flags

ZE_COMMAND_LIST_FLAG_EXP_CLONEABLE

New Functions

zeCommandListCreateCloneExp

Notes

Example Usage

Creating a Cloneable Command List