Open ayylol opened 1 week ago
This looks like a query bug to me. AFAIK, CUDA only supports sub-group size 32 and use of other sizes should lead to errors. But we should at least be able to apply the supported size and see that it indeed took an effect, I think.
Describe the bug
The required subgroup size kernel attribute is ignored on cuda and hip devices. When checking what the compile subgroup size is of a kernel that had the required subgroup size attribute set, 0 is returned.
To reproduce
using namespace sycl;
class Kernel1;
// Change this, to a value that is inside sg_sizes const int SG_SIZE = 32;
int main() { queue Q(gpu_selector_v); device D = Q.get_device(); std::vector sg_sizes = D.get_info();
std::cout << "Supported subgroup sizes for "
<< D.get_info() << ": { ";
for (size_t size : sg_sizes) {
std::cout << size << " ";
}
std::cout << "}" << std::endl;
Q.submit([&](handler &h) {
h.parallel_for(nd_range<1>(512, 128),
[=](nd_item<1> ndi)
[[sycl::reqd_sub_group_size(SG_SIZE)]] {});
}).wait();
auto KernelID = get_kernel_id();
auto Kernel =
get_kernel_bundle(Q.get_context(), {KernelID})
.get_kernel(KernelID);
std::cout << "Tried to use subgroup size: " << SG_SIZE << std::endl;
std::cout
<< "Actual used subgroup size: "
<< Kernel.get_info(
D)
<< std::endl;
std::cout << "Done" << std::endl;
return 0;
}
Supported subgroup sizes for NVIDIA GeForce RTX 2060: { 32 } Tried to use subgroup size: 32 Actual used subgroup size: 0 Done
Platform [#3]: Version : CUDA 12.4 Name : NVIDIA CUDA BACKEND Vendor : NVIDIA Corporation Devices : 1 Device [#0]: Type : gpu Version : 7.5 Name : NVIDIA GeForce RTX 2060 Vendor : NVIDIA Corporation Driver : CUDA 12.4 UUID : 862229814554125179692371413399212160130147 Num SubDevices : 0 Num SubSubDevices : 0 Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime. ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag info::device::sub_group_sizes: 32 Architecture: nvidia_gpu_sm_75