intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.21k stars 723 forks source link

Why accessor get data not equal primary data? #7591

Open wangzy0327 opened 1 year ago

wangzy0327 commented 1 year ago

Describe the bug

My code ref doc

Before I add the code

cgh.copy(data_accessor,data);

the code can run.But the data print not synchronize device operation. The result display as follow: df6d55a13cf101a5cae4392c9c5f35d

Then I add the code "cgh.copy(data.accessor,data)" to copy result from device to host.It get error ?

image

how can I sovle the problem ?

terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): Attempt to set multiple actions for the command group. Command group must consist of a single kernel or explicit memory operation. -59 (PI_ERROR_INVALID_OPERATION)

To Reproduce

my-example-code ``` #include #include #include using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names constexpr int count = 10; class AddFive; // Forward declare the name of our kernel. int main() { size_t N = 1024; queue myQueue; auto myContext = myQueue.get_context(); auto myDev = myQueue.get_device(); // Get an executable kernel bundle containing our kernel. kernel_id kernelId = get_kernel_id(); std::string myKernelName = kernelId.get_name(); std::cout<<"my kernel name : "<(myContext, { kernelId }); // Get the kernel's maximum work group size when running on our device. kernel myKernel = myBundle.get_kernel(kernelId); size_t maxWgSize = myKernel.get_info(myDev); std::cout<<"maxWgSize group size : "< divisors = { 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1 }; size_t wgSize = *std::find_if(divisors.begin(), divisors.end(), [=](auto d) { return (d <= maxWgSize); }); std::cout<<"work group size : "< vec_data(count,0); sycl::buffer data_buffer(data,sycl::range<1>(count)); // sycl::buffer> vec_data_buffer(vec_data.data(), sycl::range<1>(count)); nd_range<1> myRange { range<1> { N }, range<1> { wgSize } }; myQueue.submit([&](handler& cgh) { // Use the kernel bundle we queried, so we are sure the queried work-group // size matches the kernel we run. // accessor acc {data_buffer,cgh,range<1>(vec_data.size()/2),nd_item<1>(0),sycl::access::mode::read_write}; auto data_accessor = data_buffer.get_access(cgh); // cgh.copy(vec_data.data(),data_accessor); cgh.use_kernel_bundle(myBundle); cgh.parallel_for(myRange, ([=](nd_item<1> index) { // kernel code auto in = index.get_global_id(0); data_accessor[in] += 5; })); //copy elements of the vector into buffer associated with the accesstor cgh.copy(data_accessor,data); }); myQueue.wait(); for (auto e : data) std::cout << e << " "; std::cout << std::endl; auto arr = data_buffer.get_access(); for (int i = 0;i < count;i++){ std::cout<
  1. Include code snippet as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the comment which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment (please complete the following information):

  • OS: Linux
  • Target device and vendor: Nvidia GPU
  • DPC++ version: [2022-09]
  • Dependencies version: cuda 11.2

Additional context

dm-vodopyanov commented 1 year ago

Hello @wangzy0327, it is undefined behavior to access the original data (arr in the example) in the same scope in which sycl::buffer which owns arr (data_buffer in the example) is initialized. During sycl::buffer object destruction, implicit copy of the result from device to host is happening, so no need to call cgh.copy(...) explicitly. Regarding the second issue, SYCL prohibits to put multiple kernels and/or memory operations to the single submit aka command group. Only one kernel / memory operation per single command group is allowed. These are the changes which should be made to get rid of the issue, do they work for you?

index d00ffa2..f29670b 100644
--- a/code.cpp
+++ b/code.cpp
@@ -39,6 +39,7 @@ int main() {
   // Data initialized to zeros.
   int data[count] {0};
 //   std::vector<int> vec_data(count,0);
+  {
   sycl::buffer<int, 1> data_buffer(data,sycl::range<1>(count));   

 //   sycl::buffer<int, 1, std::vector<int>> vec_data_buffer(vec_data.data(), sycl::range<1>(count));                                  
@@ -57,16 +58,10 @@ int main() {
                                  auto in = index.get_global_id(0);
                                          data_accessor[in] += 5;        
                                }));
-    //copy elements of the vector into buffer associated with the accesstor
-    cgh.copy(data_accessor,data);
   });

   myQueue.wait();

-  for (auto e : data)
-    std::cout << e << " ";
-  std::cout << std::endl;
-
   auto arr = data_buffer.get_access<sycl::access::mode::read>();
   for (int i = 0;i < count;i++){
     std::cout<<arr[i]<<" ";
@@ -75,7 +70,11 @@ int main() {
         exit(-1);
     }
   }
+  std::cout << std::endl;
+  }

+  for (auto e : data)
+    std::cout << e << " ";
   std::cout<<std::endl;
   std::cout << "Good Random distribution !" << std::endl;
   return 0; 
wangzy0327 commented 1 year ago

Hello @wangzy0327, it is undefined behavior to access the original data (arr in the example) in the same scope in which sycl::buffer which owns arr (data_buffer in the example) is initialized. During sycl::buffer object destruction, implicit copy of the result from device to host is happening, so no need to call cgh.copy(...) explicitly. Regarding the second issue, SYCL prohibits to put multiple kernels and/or memory operations to the single submit aka command group. Only one kernel / memory operation per single command group is allowed. These are the changes which should be made to get rid of the issue, do they work for you?

index d00ffa2..f29670b 100644
--- a/code.cpp
+++ b/code.cpp
@@ -39,6 +39,7 @@ int main() {
   // Data initialized to zeros.
   int data[count] {0};
 //   std::vector<int> vec_data(count,0);
+  {
   sycl::buffer<int, 1> data_buffer(data,sycl::range<1>(count));   

 //   sycl::buffer<int, 1, std::vector<int>> vec_data_buffer(vec_data.data(), sycl::range<1>(count));                                  
@@ -57,16 +58,10 @@ int main() {
                                  auto in = index.get_global_id(0);
                                          data_accessor[in] += 5;        
                                }));
-    //copy elements of the vector into buffer associated with the accesstor
-    cgh.copy(data_accessor,data);
   });

   myQueue.wait();

-  for (auto e : data)
-    std::cout << e << " ";
-  std::cout << std::endl;
-
   auto arr = data_buffer.get_access<sycl::access::mode::read>();
   for (int i = 0;i < count;i++){
     std::cout<<arr[i]<<" ";
@@ -75,7 +70,11 @@ int main() {
         exit(-1);
     }
   }
+  std::cout << std::endl;
+  }

+  for (auto e : data)
+    std::cout << e << " ";
   std::cout<<std::endl;
   std::cout << "Good Random distribution !" << std::endl;
   return 0; 

OK,Thank you for reply. The code works correctly. There are two question about sycl memory.

  1. If I want to copy memory data from sycl Device to anothor sycl Device, it seems no relative API to directly complete the op.
  2. In my development project with sycl , I met the piUSMFree Error by
    free(data,queue)

    f024e65c7118053401592968927013f

Do you have any advice about cause this error ?

steffenlarsen commented 3 months ago

Hi @wangzy0327 !

If I want to copy memory data from sycl Device to anothor sycl Device, it seems no relative API to directly complete the op.

It depends. If you are using a buffer, you need to create an accessor inside your command-group function. Since that command-group function is submitted to a queue (which has a specific device) the memory will be transferred to the proper device accordingly.

If you are using USM, you allocate the memory on a specific queue which is associated with a device or a context+device combination (e.g. int *myptr = sycl::malloc_shared<int>(1024, Q);), so you can use either queue::copy() or if host-accessible you can use std::memcpy() or the copy implementation of your choice to move the data.

In my development project with sycl , I met the piUSMFree Error by [...]

I do not have access to the source code, but I suspect you're trying to free a non-USM memory allocation using the sycl::free function. The SYCL 2020 specification requires that

The memory pointed to by ptr must have been allocated using one of the USM allocation routines.

so using it to free anything that wasn't allocated with one of the USM allocation functions is unsupported.

github-actions[bot] commented 1 month ago

Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).

@dm-vodopyanov, could you please take one of the following actions:

Thanks!