gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.08k stars 503 forks source link

symbol cudaFuncSetAttribute, version libcudart.so.9.1 not defined in file libcudart.so.9.1 with link time reference #166

Closed mahmoodn closed 4 years ago

mahmoodn commented 4 years ago

I have installed cuda toolkit 9.1.85_387.26 and I am able to compile the application with gpgpusim driver.

$ git branch
* dev
$ source ./setup_environment debug

----------------------------------------------------------------------------
INFO - If you only care about PTX execution, ignore this message. GPGPU-Sim supports PTX execution in modern CUDA.
If you want to run PTXPLUS (sm_1x SASS) with a modern card configuration, the apps and simulator must be compiled with CUDA 4.2.
You can still run a PASCAL configuration when compiling with 4.2 by setting the $PTXAS_CUDA_INSTALL_PATH directory environment variable.
The following text describes why:
If you are using PTXPLUS, only sm_1x is supported and it requires that the app and simulator binaries are compiled in CUDA 4.2 or less.
The simulator requires it since CUDA headers desribe struct sizes in the exec which change from gen to gen.
The apps require 4.2 because new versions of CUDA tools have dropped parsing support for generating sm_1x
When running using modern config (i.e. pascal) and PTXPLUS with CUDA 4.2, the $PTXAS_CUDA_INSTALL_PATH env variable is required to get proper register usage
(and hence occupancy) using a version of CUDA that knows the register usage on the real card.

----------------------------------------------------------------------------
setup_environment succeeded
$ nvcc mmul_1.cu -o mmul_1 -g  -gencode arch=compute_70,code=compute_70 --cudart shared -lculibos -lcublas_static -lcurand_static -ldl -lpthread
$ ldd ./mmul_1
        linux-vdso.so.1 =>  (0x00007ffe8eb44000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f3e482ac000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f3e4808c000)
        libcudart.so.9.1 => /home/mahmood/gpgpu-sim_distribution/lib/gcc-4.8.5/cuda-9010/debug/libcudart.so.9.1 (0x00007f3e475b4000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f3e472ac000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f3e46fa4000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f3e46d8c000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f3e469bc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3e484b4000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f3e467a4000)
        libGL.so.1 => /lib64/libGL.so.1 (0x00007f3e46514000)
        libGLX.so.0 => /lib64/libGLX.so.0 (0x00007f3e462dc000)
        libX11.so.6 => /lib64/libX11.so.6 (0x00007f3e45f9c000)
        libXext.so.6 => /lib64/libXext.so.6 (0x00007f3e45d84000)
        libGLdispatch.so.0 => /lib64/libGLdispatch.so.0 (0x00007f3e45acc000)
        libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f3e458a4000)
        libXau.so.6 => /lib64/libXau.so.6 (0x00007f3e4569c000)

As I run the application, I see the output messages, but at the end I get this error

$ ./mmul_1 64

        *** GPGPU-Sim Simulator Version 4.0.0  [build gpgpu-sim_git-commit-6a97d1e857c37ef4b58a9a0d5c18960967e9d665_modified_0] ***

GPGPU-Sim PTX: simulation mode 0 (can change with PTX_SIM_MODE_FUNC environment variable:
               1=functional simulation only, 0=detailed performance simulator)
GPGPU-Sim PTX: overriding embedded ptx with ptx file (PTX_SIM_USE_PTX_FILE is set)
GPGPU-Sim: Configuration options:

-save_embedded_ptx                      0 # saves ptx files embedded in binary as <n>.ptx
-keep                                   0 # keep intermediate files created by GPGPU-Sim when interfacing with external programs
-gpgpu_ptx_save_converted_ptxplus                    0 # Saved converted ptxplus to a file
-gpgpu_occupancy_sm_number                   70 # The SM number to pass to ptxas when getting register usage for computing GPU occupancy. This parameter is required in the config.
...
...
...
...
----------------------------END-of-Interconnect-DETAILS-------------------------

gpgpu_simulation_time = 0 days, 0 hrs, 6 min, 31 sec (391 sec)
gpgpu_simulation_rate = 3582 (inst/sec)
gpgpu_simulation_rate = 64 (cycle/sec)
A =
B =
./mmul_1: relocation error: ./mmul_1: symbol cudaFuncSetAttribute, version libcudart.so.9.1 not defined in file libcudart.so.9.1 with link time reference

That function isn't there in my code

$ grep cudaFuncSetAttribute mmul_1.cu
$

As I grep the function name, it doesn't exists in gpgpusim library files,

$ grep -r cudaFuncSetAttribute ~/gpgpu-sim_distribution/lib/gcc-4.8.5/cuda-9010/debug/
$ grep -r cudaFuncSetAttribute ~/cuda-9.1/lib64
Binary file ./libcudart.so.9.1.85 matches
...

Anyone has tested cuda-9.1 with gpgpusim (dev branch)?

gangmul12 commented 4 years ago

Every cuda library function is emulated in libcuda/cuda_runtime_api.cc. It seems that cudaFuncSetAttribute is not implemented in that file. If you have a time, implement cudaFuncSetAttribute in the file and send pull-request to the repo, or, you can copy and paste a bypass code below into the file, then compile it

#if CUDART_VERSION >= 9000
__host__ cudaError_t CUDARTAPI cudaFuncSetAttribute(const void* func, cudaFuncAttribute attr, int value)
{
  if(g_debug_execution >= 3){
      announce_call(__my_func__);
    }    
    return g_last_cudaError = cudaSuccess;

}
#endif

Good luck!

mahmoodn commented 4 years ago

Thanks for the hint. I would like to give it a try. I see cudaFuncGetAttributes implementation here. However, as I look the to the nvidia documents about this function (here), I can't map what has been said by nvidia and what has been implemented by gpgpusim.

Maybe the developers had more documents about "the detail of this function" at that time. If you have more information, please share that.

gangmul12 commented 4 years ago

I entered your link(nvidia document) and i found that (yeah, it is just my opinion) the implementation of gpgpu-sim can be mapped pretty well to the description of the document... which part do you think that implementation is wrong? they just take a function symbol as an input (although nvidia document said that string input is deprecated..) and return the function attribute(numRegs, etc., ) as output.

mahmoodn commented 4 years ago

I meant somethings like CUCtx and gpgpusim_ptx_sim_info data structures. As I looked the code, such structures seems to be standard for accessing the kernel and I am working on that to use common statements for proper implementation. So, I think it is also possible for me, too. I will come back later.

mahmoodn commented 4 years ago

I just put a dummy code

cudaError_t CUDARTAPI cudaFuncSetAttribute(const char *hostFun, cudaFuncAttributes attr, int value )
{
    if(g_debug_execution >= 3){
        announce_call(__my_func__);
        }

    CUctx_st *context = GPGPUSim_Context();
    function_info *entry = context->get_kernel(hostFun);
    printf( "cudaFuncSetAttribute():attr = %d\n", attr );
        return g_last_cudaError = cudaSuccess;
}

to see if the function is called. When I run the program, it gives me another unimplemented function name which means it should have called cudaFuncSetAttribute. However, I don't see that printf in the output log. Is something missing here?

RSpliet commented 4 years ago

it gives me another unimplemented function name which means it should have called cudaFuncSetAttribute.

I'm afraid that assumption is not necessarily true. The relocation error messages you report are produced by the run-time linker, which I suspect is invoked as part of (or just after) forking a thread for running the CUDA program in. It is normal behaviour that the linker ensures all declared functions have a definition after loading all libraries, and (sadly) bail directly upon encountering a declared-but-not-defined function rather than gathering a list of all problems.

mahmoodn commented 4 years ago

@RSpliet You are right. As I said even a simple printf is not shown even if I define something named cudaFuncSetAttribute. However, the problem is with the simulator. Isn't that? I tried to debug more but I can not find information about where such things are handled by gpgusim. Do you know any starting point?

gangmul12 commented 4 years ago

@mahmoodn I think the real problem is with the simulator, but the program that displaying the error message is the linker. The linker tried to find the definition before the simulation is started, so I think the starting point is just making empty definition in gpgpu-sim for every error making declaration.

mahmoodn commented 4 years ago

So that should be done outside of cuda_runtime_api.cc. I mean, bypassing that as you said https://github.com/gpgpu-sim/gpgpu-sim_distribution/issues/166#issuecomment-592389961 isn't the trick. Am I right?

gangmul12 commented 4 years ago

@mahmoodn Yes.. maybe? The reason that i said trick is because function body is empty.

mahmoodn commented 4 years ago

I put some dummies in the code to bypass that linker error. I even use () for the functions! I now get

GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff920adf18..

GPGPU-Sim PTX: cudaLaunch for 0x0x50e270 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: ERROR launching kernel -- no PTX implementation found for 0x50e270
Aborted (core dumped)

Searching the log shows

GPGPU-Sim PTX: __cudaRegisterFunction volta_sgemm_64x64_nn : hostFun 0x0x50e270, fat_cubin_handle = 33
Warning: cannot find deviceFun volta_sgemm_64x64_nn

That means volta_sgemm_64x64_nn PTX code is not found. Am I right? Such kernel has SASS code. So, it looks like gpusim is not reading libcublas functions. I have compiled the code with

nvcc mmul.cu -o mmul -g  -gencode arch=compute_70,code=compute_70 \
       --cudart shared -lculibos -lcublas_static -lcurand_static -ldl -lpthread

Is my understanding correct?

gangmul12 commented 4 years ago

@mahmoodn yes you are right. Current SASS is not available for gpgpu-sim!

mahmoodn commented 4 years ago

So, how the dev branch is said to be runnable with cublas or cudnn? I haven't worked with gpgpusim for some years, so my information are pretty outdated.

mattsinc commented 4 years ago

After CUDA 8 NVIDIA stopped embedding PTX into their libraries (e.g., cuDNN, cuBLAS). So using CUDA 9.1 definitely will not work, because GPGPU-Sim needs that PTX to know what to simulate (you can use CUDA 9.1 with GPGPU-Sim as long as you aren't using CUDA libraries). That's (probably) why you are seeing the "ERROR launching kernel -- no PTX implementation found for 0x50e270" error.

So, even after adding the dummy code, you'll need to downgrade to CUDA 8 if you want to use CUDA libraries like cuDNN. An alternative would be to change your code to use cuTLASS instead, which is open source. The performance of cuTLASS isn't quite as good as cuDNN, but is probably good enough if using CUDA 9+ is absolutely necessary.

I realize neither of these are the answers you are hoping to see, but unless you can convince NVIDIA to embed the PTX again, until we find a workaround or GPGPU-Sim supports running SASS, those are the options.

Matt

mahmoodn commented 4 years ago

So, you are saying that this paper uses cuda 8?

P.S: culibos is a static library

$ ls -l ~/cuda-9.1/lib64/libculibos.a
-rw-rw-r-- 1 mahmood mahmood 1649302 Feb 27 10:39 /home/mahmood/cuda-9.1/lib64/libculibos.a
mattsinc commented 4 years ago

It was either CUDA 7. or CUDA 8. for that paper, I don't remember which one off the top of my head -- but it was definitely earlier than CUDA 9 for the aforementioned reasons.

Matt

mattsinc commented 4 years ago

@mahmoodn you should push your change for cudaFuncSetAttribute too.

Matt