Closed mahmoodn closed 4 years ago
Every cuda library function is emulated in libcuda/cuda_runtime_api.cc. It seems that cudaFuncSetAttribute is not implemented in that file. If you have a time, implement cudaFuncSetAttribute in the file and send pull-request to the repo, or, you can copy and paste a bypass code below into the file, then compile it
#if CUDART_VERSION >= 9000
__host__ cudaError_t CUDARTAPI cudaFuncSetAttribute(const void* func, cudaFuncAttribute attr, int value)
{
if(g_debug_execution >= 3){
announce_call(__my_func__);
}
return g_last_cudaError = cudaSuccess;
}
#endif
Good luck!
Thanks for the hint. I would like to give it a try. I see cudaFuncGetAttributes implementation here. However, as I look the to the nvidia documents about this function (here), I can't map what has been said by nvidia and what has been implemented by gpgpusim.
Maybe the developers had more documents about "the detail of this function" at that time. If you have more information, please share that.
I entered your link(nvidia document) and i found that (yeah, it is just my opinion) the implementation of gpgpu-sim can be mapped pretty well to the description of the document... which part do you think that implementation is wrong? they just take a function symbol as an input (although nvidia document said that string input is deprecated..) and return the function attribute(numRegs, etc., ) as output.
I meant somethings like CUCtx and gpgpusim_ptx_sim_info data structures. As I looked the code, such structures seems to be standard for accessing the kernel and I am working on that to use common statements for proper implementation. So, I think it is also possible for me, too. I will come back later.
I just put a dummy code
cudaError_t CUDARTAPI cudaFuncSetAttribute(const char *hostFun, cudaFuncAttributes attr, int value )
{
if(g_debug_execution >= 3){
announce_call(__my_func__);
}
CUctx_st *context = GPGPUSim_Context();
function_info *entry = context->get_kernel(hostFun);
printf( "cudaFuncSetAttribute():attr = %d\n", attr );
return g_last_cudaError = cudaSuccess;
}
to see if the function is called. When I run the program, it gives me another unimplemented function name which means it should have called cudaFuncSetAttribute
. However, I don't see that printf in the output log.
Is something missing here?
it gives me another unimplemented function name which means it should have called
cudaFuncSetAttribute
.
I'm afraid that assumption is not necessarily true. The relocation error messages you report are produced by the run-time linker, which I suspect is invoked as part of (or just after) forking a thread for running the CUDA program in. It is normal behaviour that the linker ensures all declared functions have a definition after loading all libraries, and (sadly) bail directly upon encountering a declared-but-not-defined function rather than gathering a list of all problems.
@RSpliet
You are right. As I said even a simple printf is not shown even if I define something named cudaFuncSetAttribute
.
However, the problem is with the simulator. Isn't that? I tried to debug more but I can not find information about where such things are handled by gpgusim. Do you know any starting point?
@mahmoodn I think the real problem is with the simulator, but the program that displaying the error message is the linker. The linker tried to find the definition before the simulation is started, so I think the starting point is just making empty definition in gpgpu-sim for every error making declaration.
So that should be done outside of cuda_runtime_api.cc
. I mean, bypassing that as you said https://github.com/gpgpu-sim/gpgpu-sim_distribution/issues/166#issuecomment-592389961 isn't the trick. Am I right?
@mahmoodn Yes.. maybe? The reason that i said trick is because function body is empty.
I put some dummies in the code to bypass that linker error. I even use () for the functions! I now get
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff920adf18..
GPGPU-Sim PTX: cudaLaunch for 0x0x50e270 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: ERROR launching kernel -- no PTX implementation found for 0x50e270
Aborted (core dumped)
Searching the log shows
GPGPU-Sim PTX: __cudaRegisterFunction volta_sgemm_64x64_nn : hostFun 0x0x50e270, fat_cubin_handle = 33
Warning: cannot find deviceFun volta_sgemm_64x64_nn
That means volta_sgemm_64x64_nn PTX code is not found. Am I right? Such kernel has SASS code. So, it looks like gpusim is not reading libcublas functions. I have compiled the code with
nvcc mmul.cu -o mmul -g -gencode arch=compute_70,code=compute_70 \
--cudart shared -lculibos -lcublas_static -lcurand_static -ldl -lpthread
Is my understanding correct?
@mahmoodn yes you are right. Current SASS is not available for gpgpu-sim!
So, how the dev branch is said to be runnable with cublas or cudnn? I haven't worked with gpgpusim for some years, so my information are pretty outdated.
After CUDA 8 NVIDIA stopped embedding PTX into their libraries (e.g., cuDNN, cuBLAS). So using CUDA 9.1 definitely will not work, because GPGPU-Sim needs that PTX to know what to simulate (you can use CUDA 9.1 with GPGPU-Sim as long as you aren't using CUDA libraries). That's (probably) why you are seeing the "ERROR launching kernel -- no PTX implementation found for 0x50e270" error.
So, even after adding the dummy code, you'll need to downgrade to CUDA 8 if you want to use CUDA libraries like cuDNN. An alternative would be to change your code to use cuTLASS instead, which is open source. The performance of cuTLASS isn't quite as good as cuDNN, but is probably good enough if using CUDA 9+ is absolutely necessary.
I realize neither of these are the answers you are hoping to see, but unless you can convince NVIDIA to embed the PTX again, until we find a workaround or GPGPU-Sim supports running SASS, those are the options.
Matt
So, you are saying that this paper uses cuda 8?
P.S: culibos is a static library
$ ls -l ~/cuda-9.1/lib64/libculibos.a
-rw-rw-r-- 1 mahmood mahmood 1649302 Feb 27 10:39 /home/mahmood/cuda-9.1/lib64/libculibos.a
It was either CUDA 7. or CUDA 8. for that paper, I don't remember which one off the top of my head -- but it was definitely earlier than CUDA 9 for the aforementioned reasons.
Matt
@mahmoodn you should push your change for cudaFuncSetAttribute too.
Matt
I have installed cuda toolkit 9.1.85_387.26 and I am able to compile the application with gpgpusim driver.
As I run the application, I see the output messages, but at the end I get this error
That function isn't there in my code
As I grep the function name, it doesn't exists in gpgpusim library files,
Anyone has tested cuda-9.1 with gpgpusim (dev branch)?