intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.2k stars 707 forks source link

[CUDA] 'PI_ERROR_INVALID_KERNEL_NAME' when dynamic library with kernel is loaded by dlopen secondly #11089

Open DongBaiYue opened 10 months ago

DongBaiYue commented 10 months ago

Describe the bug 'PI_ERROR_INVALID_KERNEL_NAME' error when dynamic library with kernel is loaded by dlopen secondly on linux with CUDA backend.

To Reproduce Error message:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAterminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Native API failed. Native API returns: -46 (PI_ERROR_INVALID_KERNEL_NAME)
Aborted (core dumped)

The small reproducer:

// main.cpp
#include <sycl/sycl.hpp>
#include <dlfcn.h>

void dlexe(std::string shared_lib_path, std::string func_name){
    void * so_handler = dlopen(shared_lib_path.c_str(), RTLD_LAZY);
    void (*kernel_func)() = (void (*)())dlsym(so_handler, func_name.c_str());
    kernel_func();
    dlclose(so_handler);
}

int main(){
    dlexe("sycl_libA.so", "funcA");
    dlexe("sycl_libB.so", "funcB");
    return 0;
}
// sycl_libA.cpp
#include <CL/sycl.hpp>
using namespace sycl;

#ifdef __cplusplus
extern "C"
#endif
void funcA() {
    queue q;
    q.submit([&](sycl::handler &h) {
        sycl::stream os(1024, 768, h);
        h.parallel_for(32, [=](sycl::id<1> i) {
            os<<"A";
        });
    }).wait();
}
// sycl_libB.cpp
#include <CL/sycl.hpp>
using namespace sycl;

#ifdef __cplusplus
extern "C"
#endif
void funcB() {
    queue q;
    q.submit([&](sycl::handler &h) {
        sycl::stream os(1024, 768, h);
        h.parallel_for(32, [=](sycl::id<1> i) {
            os<<"B";
        });
    }).wait();
}
// command
clang++ -std=c++17 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -fPIC -shared sycl_libA.cpp -o sycl_libA.so
clang++ -std=c++17 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -fPIC -shared sycl_libB.cpp -o sycl_libB.so
clang++ -std=c++17 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda main.cpp -ldl

Environment (please complete the following information):

Additional context if DPC++ version is 2022-12,Error message is:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
PI CUDA ERROR:
        Value:           500
        Name:            CUDA_ERROR_NOT_FOUND
        Description:     named symbol not found
        Function:        cuda_piKernelCreate
        Source Location: /home/ly/sycl_workspace/llvm-2022-12/sycl/plugins/cuda/pi_cuda.cpp:2872

terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Native API failed. Native API returns: -999 (Unknown PI error)
Aborted (core dumped)
DongBaiYue commented 9 months ago

Can you reproduce this bug?

@npmiller @mdtoguchi @AerialMantis

mdtoguchi commented 9 months ago

This command clang++ -std=c++17 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda main.cpp -ldl does not have any references to the generated shared objects, is that purposeful? The executable built from main.cpp needs to have some kind of reference to the shared objects to or it won't know what to look for to resolve.

DongBaiYue commented 9 months ago

This command clang++ -std=c++17 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda main.cpp -ldl does not have any references to the generated shared objects, is that purposeful? The executable built from main.cpp needs to have some kind of reference to the shared objects to or it won't know what to look for to resolve.

dlfcn allows runtime dynamic library loading, no need to link at compile time。

DongBaiYue commented 9 months ago

For the above reproducer

//expected result
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
//actual result
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAterminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Native API failed. Native API returns: -46 (PI_ERROR_INVALID_KERNEL_NAME)
Aborted (core dumped)

The error occurs when funcB is executed after funcA is successfully executed.