About static link to cudnn and cublas

gangmul12 commented 4 years ago

I think nobody managing this repo now, but for future usage.

my env: OS : Ubuntu 16.04 cudnn : 7.1.4 cuda : 8.0

and i installed pytorch according to the instruction of @cng123. link is below https://docs.google.com/document/d/17fSM2vrWodP8rWR7ctpgaggVXEw0uD2VCAh0Gi4Gpb4/edit?usp=sharing

I downloaded https://github.com/pytorch/examples, and run imagenet benchmark. Then, I found gpgpu-sim often face deadlock, seg fault, or CUDNN_STATUS_INTERNAL_ERROR I analyzed this problem by using LD_DEBUG flags, then i found that pytorch library dynamically loads libcuda.so.1, which should not be linked.

I've found two reason why it tried to link to libcuda.so.1 instead of gpgpu-sim's libcudart.so.

In compilation stage for _nvrtc.so, there is a link flag to libcuda.so. https://github.com/gpgpu-sim/pytorch-gpgpu-sim/blob/1e37e088ae002959d5d26bbb90e23d023161fc21/setup.py#L1010 I can bypass issue from this by using static library of cudnn or deleting lcuda link flag(if you want to use shared version of cudnn). I personally prefer using static library.
libcublas.so has a link to libcuda.so. Strangely, i can't find a explicity linkage libcublas.so to libcuda.so when i check it using ldd command, but LD_DEBUG result shows that libcublas.so calls functions in libcuda.so. At first, I tried to resolve this issue by making a copy of libcudart.so with the name of libcuda.so.1. However, there are so many unimplemented cuda driver function in cuda_runtime_api.cc, so my terminal generated CUDNN_STATUS_INTERNAL_ERROR very quickly. Then, I just built pytoch with libcublas_static.a by modifying some cmake value. like https://github.com/gpgpu-sim/pytorch-gpgpu-sim/blob/1e37e088ae002959d5d26bbb90e23d023161fc21/CMakeLists.txt#L80 https://github.com/gpgpu-sim/pytorch-gpgpu-sim/blob/1e37e088ae002959d5d26bbb90e23d023161fc21/cmake/public/cuda.cmake#L248-L251

Then, many errors were gone. I also think this is closely related the merged pull-request of gpgpu-sim-distribution, https://github.com/gpgpu-sim/gpgpu-sim_distribution/pull/129

I'm not sure it is meaningful to improve old version of pytorch(ver0.4) but anyway, i hope this issue help your simulation

Thank you

ohcurrent commented 4 years ago

Hello gangmul12! It has been a long time talking to you.

Are you still working on pytorch-gpgpusim? Did you make some progress running any of the Pytorch examples which fully runs on GPGPU-Sim?

gangmul12 commented 4 years ago

Hi! i worked on pytorch-gpgpusim, but i've realized that many kernels in cuDNN library is implemented with only SASS, so now i'm studying SASS in fact XD However, without failing to simulate SASS only kernel, i've successfully ran(and ignore SASS only kernels) an example!

ohcurrent commented 4 years ago

@gangmul12 I see.... The kernels you mentioned about in cuDNN library which only has SASS, does that include "maxwell_sgemm_128x64_raggedMn_tn_splitK" kernel? Thanks for answering.

gangmul12 commented 4 years ago

@ohcurrent exactly. every kernel named maxwell_something_blahblah does not has PTX code... Also, some kernels have ptx version code, but their function bodies are just {ret;}...

cng123 commented 4 years ago

To add onto this, the maxwell_something_xxxx function headers are not in the newer CuDNN versions, so it might have been a mistake that they were there in the first place.

ohcurrent commented 4 years ago

@cng123, Then did you simulate with higher version of cuDNN7.1.4 ?

gangmul12 commented 4 years ago

@ohcurrent, I saw many cuDNN kernel is optimized to some of its architecture, newer versions have volta_xxxx then it only contains SASS code. According to a few articles like https://arxiv.org/abs/1804.06826 or https://hal.inria.fr/hal-00789958/document, it seems that there is an optimization that can only be done in SASS level, and can not be provided by nvcc.. I think that is the reason why there are some kernels implemented only in SASS level.

ohcurrent commented 4 years ago

@gangmul12 I see, thanks for sharing. I thought that kernel was related to cuBLAS.

Azuresonance commented 3 years ago

@gangmul12 This may be off-topic, but I am trying to obtain some information on your fork of this repository (gangmul12/pytorch-v1.1-gpgpu-sim), which unfortunately does not have the Issues tab enabled.

I was trying to build your project with CUDA 8.0 and CUDNN 7.1.3, since versions above this doesn't work according to the developer of GPGPU-Sim (https://github.com/gpgpu-sim/gpgpu-sim_distribution/issues/166#issuecomment-604505230)

After installing, I attempted to run an MNIST example (https://github.com/pytorch/examples/blob/master/mnist/main.py), and got the following output: Traceback (most recent call last): File "./main.py", line 139, in main() File "./main_original.py", line 128, in main train(args, model, device, train_loader, optimizer, epoch) File "./main_original.py", line 42, in train output = model(data) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "./main.py", line 22, in forward x = self.conv1(x) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

I would be grateful for some help, whether it's a solution or some hint to narrow the problem down.

gangmul12 commented 3 years ago

@Azuresonance Hi, At that time, gpgpu-sim version is dev branch of ver. 3x, so i'm not sure where the error is from. The possible point is.. when you execute any command that is related to cuda, gpgpu sim should be started. However it seems that gpgpu-sim has not been started.(or you just didn't print gpgpu-sim log?) maybe it is because you use different cuda version for gpgpu-sim and pytorch, or rpath option is not deleted when you installed pytorch.. It is good start point to check.

gpgpu-sim / pytorch-gpgpu-sim

About static link to cudnn and cublas #5