oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

Parameter validation layer erroneously checks pCommandQueueGroupProperties #120

Open etomzak opened 1 year ago

etomzak commented 1 year ago

I'm seeing some unexpected behavior with the parameter validation layer enabled.

Expected behavior:

Calling zeDeviceGetCommandQueueGroupProperties() populates the memory pointed to by the pCommandQueueGroupProperties parameter.

Actual behavior:

Calling zeDeviceGetCommandQueueGroupProperties() returns ZE_RESULT_ERROR_INVALID_ARGUMENT.

Comments

The issue is that the validation layer expects the ze_command_queue_group_properties_t::stype field to be set by the application. However, it is the responsibility of the zeDeviceGetCommandQueueGroupProperties() function to initialize the data. The work-around is for the application to partially initialize the data by setting all ze_command_queue_group_properties_t::stype fields before calling zeDeviceGetCommandQueueGroupProperties().

I think this is the offending line, and I suspect the solution is just to remove it.

jandres742 commented 1 year ago

thanks @etomzak . It is true that zeDeviceGetCommandQueueGroupProperties populates the data, but only for [out] args. [In] arguments, as stype, are only inputs, and must be set by the application.

ze_structure_type_t stype [in] type of this structure

ze_structure_type_t stype [in] type of this structure

void *pNext [in,out][optional] must be null or a pointer to an extension-specific structure (i.e. contains stype and pNext).

ze_command_queue_group_property_flags_t flags [out] 0 (none) or a valid combination of ze_command_queue_group_property_flag_t

size_t maxMemoryFillPatternSize [out] maximum pattern_size supported by command queue group. See zeCommandListAppendMemoryFill for more details.

uint32_t numQueues [out] the number of physical engines within the group.

etomzak commented 1 year ago

That is a ... surprising API design. I'm not convinced it makes sense. Intuitively, even if ze_command_queue_group_properties_t::stype is labeled as [in], the data flow direction in an API query is from the runtime to the application, so one would expect the runtime to put the information "in" to the struct so that the application can then read it out.

More broadly, the concept of [in]/[out] is only relevant to descriptor structs (structs that are used to bundle function arguments together). It's strange to even have them on the fields of something like ze_command_queue_group_properties_t.

I looked up Vulkan to see what they do in this situation. A couple of examples are VkDescriptorSetLayoutBinding and VkQueueFamilyProperties. Interestingly, these structs don't have an sType field at all, assumedly because Vulkan only uses sType for *Info-style structs (parameter bundling).

jandres742 commented 1 year ago

thanks @etomzak . The use of stype as in as you mention, where in means that users set that value, follows the convention of other APIs, like Vulkan, please see here https://registry.khronos.org/vulkan/site/guide/latest/pnext_and_stype.html#:~:text=The%20void*%20pNext%20is%20used,extensions%20that%20expose%20new%20structures.

In L0, stype is use for properties and other structs.

Having stype as in is needed so the user can tell the target driver which structure the user is passing. W/o this , the driver would have to guess which struct the application is trying to use, which might be not efficient or easy to implement.