oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

2D kernels with Level Zero #71

Closed jjfumero closed 2 years ago

jjfumero commented 2 years ago

I have a question about running 2D kernels with level zero and the number of threads as block size for each dimension.

In level zero, I am using these set of calls to setup the number of threads:

uint32_t groupSizeX = 256u;
uint32_t groupSizeY = 256u;
uint32_t groupSizeZ = 1u;
zeKernelSuggestGroupSize(kernel, items, items, 1U, &groupSizeX, &groupSizeY, &groupSizeZ);

// Set number of threads after suggestion
zeKernelSetGroupSize(kernel, groupSizeX, groupSizeY, groupSizeZ);

// set kernel args
...

// run kernel
ze_group_count_t dispatch;
dispatch.groupCountX = items / groupSizeX;
dispatch.groupCountY = items / groupSizeY;
dispatch.groupCountZ = 1;
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatch, kernelTsEvent, 0, nullptr);

I noticed that, after the LevelZero suggestion (zeKernelSuggestGroupSize), the groupSize for the Y dimension (groupSizeY variable) is set to 1.

On my GPU, I see that I can actually run 256x256:

stype : DEVICE_COMPUTE_PROPERTIES
pNext : 0x0
maxTotalGroupSize : 256
maxGroupSizeX : 256
maxGroupSizeY : 256
maxGroupSizeZ : 256
maxGroupCountX : 4294967295
maxGroupCountY : 4294967295
maxGroupCountZ : 4294967295
maxSharedLocalMemory : 65536
numSubGroupSizes : 3
subGroupSizes : [ 8, 16, 32, 0, 0, 0, 0, 0 ]

If I try to force it, I get a ZE_RESULT_ERROR_INVALID_GROUP_SIZE_DIMENSION. Is there any way to run with block size with more than 1 thread? Am I missing something? If needed, I can prepare a test-case.

Note that, when using OpenCL I can run on the same Intel HD Graphics with a block of 256x256 as a local work-group size.

Any pointers will be appreciated.

Hardware & Drivers:

jandres742 commented 2 years ago

@jjfumero this seems more of a driver implementation question. Please move this to https://github.com/intel/compute-runtime