Open GaryHuan9 opened 2 weeks ago
Thank you for the bug report! It's great to see people learning SYCL.
Your reproducer suffers from a couple problems:
The first has shown us that we were missing the implementation of the work_item
scope. It's generated a few internal discussions on how to properly handle work_item
scopes and, for now, we're going to allow this scope to fallback to a coarser grained scope, so that users won't run into the unhelpful error that you were seeing.
You can find the change for this here: https://github.com/intel/llvm/pull/16172
As a current workaround, before the above PR is merged, you could try compiling the reproducer by specifying your device architecture:
icpx -fsycl -fsycl-targets=nvidia_gpu_sm_xx main.cpp
By specifying your achitecture, it enables more scopes available to your device, and will fallback to a much coarser grained scope, at the system
level. At least sm_60
is required, which your device is capable of.
The second is that your reproducer is attempting to apply an atomic_ref
to memory that is private to the thread. This is disallowed on NVIDIA and you will run into address space errors.
Please try applying atomic_ref
to device memory that is not private to the thread. For example:
#include <sycl/sycl.hpp>
int main() {
sycl::queue queue(sycl::gpu_selector_v);
std::cout << "Device: "
<< queue.get_device().get_info<sycl::info::device::name>()
<< std::endl;
int *data = sycl::malloc_device<int>(1, queue);
queue.submit([&](sycl::handler &cgh) {
sycl::stream out(1024, 256, cgh);
cgh.parallel_for(10, [=](sycl::id<> id) {
data[0] = 0;
sycl::atomic_ref<int, sycl::memory_order::relaxed,
sycl::memory_scope::work_item,
sycl::access::address_space::generic_space>
at(data[0]);
int load = at.exchange(2);
out << "id " << id << " load " << load << sycl::endl;
});
});
queue.wait_and_throw();
sycl::free(data, queue);
}
Describe the bug
Hey! I am learning to use SYCL but I encountered a little issue when using
sycl::atomic_ref::exchange
. Things work fine on CPU, but when I switched to GPU even a very simple test (see below) crash with a CUDA error. Other atomic primitives such asstore
orload
works fine.To reproduce
This is my output; obviously it crashes which is not what one would expect.
Environment
icpx --version
output:nvidia-smi
:And output of
sycl-ls --verbose
:Additional context
No response