intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.2k stars 712 forks source link

[SYCL][CUDA] Copy between device and managed/shared memory fails on WSL #9632

Open krasznaa opened 1 year ago

krasznaa commented 1 year ago

Describe the bug

This is a super obscure error that I bumped into just now. If we can even call it an error...

One of the unit tests of our project tries to copy data between a memory area in managed/shared memory, and another one in device memory. There is a fair amount of layers between our code and the underlying SYCL code doing that, but that's what's happening here:

https://github.com/acts-project/vecmem/blob/main/tests/sycl/test_sycl_jagged_containers.sycl#L428

This code worked well on all platforms that I have tried until today. But today I tried to make it work on a pretty obscure platform. I'm using a hand-built version of the 2022-12 tag of this repository in WSL, with CUDA 11.7.1 installed in WSL as well, and the latest NVIDIA driver installed on Windows itself. In this definitely non-standard setup that test crashes with the following:

...
[ RUN      ] sycl_jagged_containers_test.set_in_contiguous_kernel
[       OK ] sycl_jagged_containers_test.set_in_contiguous_kernel (5 ms)
[ RUN      ] sycl_jagged_containers_test.filter

Thread 1 "vecmem_test_syc" received signal CUDA_EXCEPTION_15, Invalid Managed Memory Access.
vecmem::copy::copy_views_impl<int, int> (this=0x7fffffffd730, sizes=..., from_view=0x204e01000, cptype=vecmem::copy::type::unknown, to_view=<optimized out>)
    at /mnt/c/Users/krasz/ATLAS/vecmem/vecmem/core/include/vecmem/utils/impl/copy.ipp:415
415             do_copy(sizes[i] * sizeof(TYPE1), from_view[i].ptr(), to_view[i].ptr(),
(cuda-gdb) bt
#0  vecmem::copy::copy_views_impl<int, int> (this=0x7fffffffd730, sizes=..., from_view=0x204e01000, cptype=vecmem::copy::type::unknown, to_view=<optimized out>)
    at /mnt/c/Users/krasz/ATLAS/vecmem/vecmem/core/include/vecmem/utils/impl/copy.ipp:415
#1  vecmem::copy::operator()<int, int> (this=this@entry=0x7fffffffd730, from_view=..., to_view=..., cptype=cptype@entry=vecmem::copy::type::unknown)
    at /mnt/c/Users/krasz/ATLAS/vecmem/vecmem/core/include/vecmem/utils/impl/copy.ipp:296
#2  0x0000000000427d33 in vecmem::copy::operator()<int, int, std::pmr::polymorphic_allocator<std::vector<int, std::pmr::polymorphic_allocator<int> > >, std::pmr::polymorphic_allocator<int> > (this=this@entry=0x7fffffffd730, from_view=..., to_vec=..., cptype=cptype@entry=vecmem::copy::type::unknown)
    at /mnt/c/Users/krasz/ATLAS/vecmem/vecmem/core/include/vecmem/utils/impl/copy.ipp:326
#3  0x00000000004234a1 in sycl_jagged_containers_test_filter_Test::TestBody (this=this@entry=0xc8a970)
    at /mnt/c/Users/krasz/ATLAS/vecmem/vecmem/tests/sycl/test_sycl_jagged_containers.sycl:428
#4  0x00007ffff7f86d29 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (method=<optimized out>, location=0x7ffff7f932f5 "the test body",
    object=<optimized out>) at /home/krasznaa/ATLAS/vecmem/build-llvm/_deps/googletest-src/googletest/src/gtest.cc:2607
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0xc8a970, method=<optimized out>, location=0x7ffff7f932f5 "the test body")
    at /home/krasznaa/ATLAS/vecmem/build-llvm/_deps/googletest-src/googletest/src/gtest.cc:2643
#6  0x00007ffff7f62380 in testing::Test::Run (this=this@entry=0xc8a970) at /home/krasznaa/ATLAS/vecmem/build-llvm/_deps/googletest-src/googletest/src/gtest.cc:2682
...

There was no deep thinking behind setting up the test like this, it was just convenient for technical reasons. And as soon as I stop using shared memory there and switch to using host memory, this error disappears. But since the error only shows up on WSL, I thought it would be interesting to share this find. :wink:

To Reproduce

Is a bit difficult. 😦 I described my OS / software setup above. In that environment one can just build https://github.com/acts-project/vecmem/tree/v0.25.0 with its tests, and the error shows up. Unfortunately both setting up this build environment, and then building the project in that environment is not absolutely trivial. So I'd only produce a writeup about it on request...

Environment (please complete the following information)

clang version 16.0.0 (https://github.com/intel/llvm.git 6977f1aced3ed6a08573fdbdd4f35a5d719c8d98)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/llvm-2022-12/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /home/krasznaa/software/cuda/11.7.1, version 11.7

Pinging @ivorobts.

krasznaa commented 1 year ago

It's worth adding (I only realised afterwards) that we do exactly the same test in CUDA as well.

https://github.com/acts-project/vecmem/blob/v0.25.0/tests/cuda/test_cuda_jagged_vector_view.cpp#L162-L198

Using the CUDA API to perform a copy from a device memory area to a managed one does not produce a runtime error in the same WSL environment. 🤔

[bash][Celeborn]:vecmem > ~/ATLAS/vecmem/build-llvm/bin/vecmem_test_cuda
[==========] Running 23 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 9 tests from cuda_containers_test
[ RUN      ] cuda_containers_test.managed_memory
[       OK ] cuda_containers_test.managed_memory (658 ms)
...
[----------] 5 tests from cuda_jagged_vector_view_test
[ RUN      ] cuda_jagged_vector_view_test.mutate_in_kernel
[       OK ] cuda_jagged_vector_view_test.mutate_in_kernel (2 ms)
[ RUN      ] cuda_jagged_vector_view_test.set_in_kernel
[       OK ] cuda_jagged_vector_view_test.set_in_kernel (4 ms)
[ RUN      ] cuda_jagged_vector_view_test.set_in_contiguous_kernel
[       OK ] cuda_jagged_vector_view_test.set_in_contiguous_kernel (6 ms)
[ RUN      ] cuda_jagged_vector_view_test.filter
[       OK ] cuda_jagged_vector_view_test.filter (4 ms)
[ RUN      ] cuda_jagged_vector_view_test.zero_capacity
[       OK ] cuda_jagged_vector_view_test.zero_capacity (5 ms)
[----------] 5 tests from cuda_jagged_vector_view_test (23 ms total)
...
[----------] Global test environment tear-down
[==========] 23 tests from 4 test suites ran. (1146 ms total)
[  PASSED  ] 23 tests.

So there is definitely some SYCL / LLVM specificity there, it's not just that CUDA would not allow this operation. 🤔