oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
714 stars 110 forks source link

including <oneapi/dpl/*> headers cause CUDA_ERROR_NOT_INITIALIZED after fork #1631

Open jonasdelacour opened 2 weeks ago

jonasdelacour commented 2 weeks ago

Including any <oneapi/dpl> header seems to construct a sycl::queue, which must not be done prior to fork() calls. If you do you get CUDA_ERROR_NOT_INITIALIZED when the child process attempts to destroy this queue and underlying CUDA context.

Here's a minimum example to reproduce this error:

#include <oneapi/dpl/algorithm>
#include <unistd.h>

int main(){
    pid_t pid = fork();
    return 0;
}

Compiled with icpx -fsycl

produces the following stack trace from valgrind:

==11688== Process terminating with default action of signal 6 (SIGABRT)
==11688==    at 0x4FD79FC: __pthread_kill_implementation (pthread_kill.c:44)
==11688==    by 0x4FD79FC: __pthread_kill_internal (pthread_kill.c:78)
==11688==    by 0x4FD79FC: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==11688==    by 0x4F83475: raise (raise.c:26)
==11688==    by 0x4F697F2: abort (abort.c:79)
==11688==    by 0x4914B9D: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x492020B: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x4920276: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30)
==11688==    by 0x4CB25BA: __clang_call_terminate (in /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7.1.0)
==11688==    by 0x4CC0E0F: sycl::_V1::detail::queue_impl::~queue_impl() (in /opt/intel/oneapi/compiler/2024.1/lib/libsycl.so.7.1.0)
==11688==    by 0x4025EA: oneapi::dpl::execution::__dpl::device_policy<oneapi::dpl::execution::__dpl::DefaultKernelName>::~device_policy() (in /home/jonaslacour/playground/a.out)
==11688==    by 0x4F86494: __run_exit_handlers (exit.c:113)
==11688==    by 0x4F8660F: exit (exit.c:143)
==11688==    by 0x4F6AD96: (below main) (libc_start_call_main.h:74)

icpx version: Intel(R) oneAPI DPC++/C++ Compiler 2024.1.2 (2024.1.2.20240508) codeplay plugin for Nvidia gpus version: oneapi-for-nvidia-gpus-2024.1.2-cuda-12.0 nvidia-smi output:

NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:17:00.0 Off |                  Off |
|  0%   41C    P8              18W / 450W |    162MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off | 00000000:4E:00.0 Off |                  Off |
|  0%   33C    P8              10W / 450W |     16MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

OS: Ubuntu 22.04.4 LTS

akukanov commented 1 week ago

Yes, oneDPL has a predefined execution policy (dpcpp_default) that creates a SYCL queue at construction.

As a workaround, if that predefined policy is not used in the program, you can define the ONEDPL_USE_PREDEFINED_POLICIES macro to zero before including any oneDPL header. Some details here: https://oneapi-src.github.io/oneDPL/macros.html#additional-macros