gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.04k stars 494 forks source link

Help me to problems with setting up pytorch-gpgpu-sim #168

Open jswon opened 4 years ago

jswon commented 4 years ago

Hello, I have been to setting up pytorch-gpgpu-sim in this repository.

I wonder..

When I run pytorch in gpgpu-sim, is there anything else to set?

Here's my bashrc configuration.

export LD_LIBARRY_PATH=/home/aim/aim_gpu_sim/gpgpu-sim_distribution/lib/gcc-5.4.0/cuda-9010/release/    :/usr/local/cuda-9.1/lib64/
export PYTORCH_BIN=/usr/local/cuda-9.1/lib64/libcudnn.so

I'm used python3.5 on virtual env.

when i run to pyton...

(venv)$ python

>> import torch

 ~~~
 Extracting specific PTX file named libcudnn.1.sm_70.ptx
Extracting PTX file and ptxas options    1: libcudnn.1.sm_70.ptx -arch=sm_70  -maxrregcount=48

 ...

Extracting specific PTX file named libcudnn.179.sm_70.ptx
Extracting PTX file and ptxas options  179: libcudnn.179.sm_70.ptx -arch=sm_70

 ~~~~(skip)

 GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file libcudnn.125.sm_70.ptx
GPGPU-Sim PTX: Parsing libcudnn.126.sm_70.ptx
libcudnn.126.sm_70.ptx:1: Syntax error:

   bar.sync 0;
             ^

GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file libcudnn.126.sm_70.ptx
GPGPU-Sim PTX: Parsing libcudnn.127.sm_70.ptx
libcudnn.127.sm_70.ptx:2: Syntax error:

   add.s32 %r415, %r16, 8;
     ^

GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file libcudnn.127.sm_70.ptx
GPGPU-Sim PTX: Parsing libcudnn.128.sm_70.ptx
libcudnn.128.sm_70.ptx:1: Parse error: variable has no space specification (ptx_parser.cc:342)

   add.s32 %r415, %r16, 8;
               ^

Aborted (core dumped)

and When I open a ptx file with parse error, it is like this..

That's it.

in libcudnn.128.sm_70.ptx

.version 6.2
.target sm_70
.address_size 64

i think that when generate the ptx files, raised problems.

In my case, is it normal for number of 180 ptx files to be created??

What causes this?

I really don't know how to do it.....

Is it a problem with my settings or is it the wrong way to run it?

Please help me how to solve

Thank you . :)

my env: OS : Ubuntu 16.04 cudnn : 7.2.1 cuda : 9.1

mattsinc commented 4 years ago

I don't know the answer to your other questions, but I do know that after CUDA 8 NVIDIA stopped embedding PTX into their libraries (e.g., cuDNN, cuBLAS). So using CUDA 9.1 definitely will not work, because GPGPU-Sim needs that PTX to know what to simulate (and PyTorch is using library calls -- I believe you can use CUDA 9.1 with GPGPU-Sim as long as you aren't using CUDA libraries).

Setting that aside, you would also need to make sure to build with the static versions of the libraries (e.g., libcudnn_static, libcublas_static, etc.) in order to get the PTX that NVIDIA embedded in the libraries (until CUDA 9). Again, I have not tried this personally though, so hopefully someone else who's gotten this to work with CUDA 8 can provide additional help.

Matt

mahmoodn commented 4 years ago

Although it is said that static libraries should work, but I wasn't able to successfully use that. the output is shown here. I tried a lot to find out why the static versions don't work. Still I haven't figured out yet. Sorry for hijacking this topic. But the result will be beneficial for all.

mattsinc commented 4 years ago

Your error seems to be the same as what I described above -- CUDA 9.1 does not embed the PTX, and the error at the end of your discussion seems to indicate the same ("ERROR launching kernel -- no PTX implementation found for 0x50e270"). As mentioned above, GPGPU-Sim relies on having this PTX to work, so you would need to use CUDA 8 if you want to use CUDA libraries (Note that above I said static CUDA libraries would only work with CUDA 8, not that static CUDA libraries would always work). Also, you may need to use culibos_static too -- although I doubt that will fix the problem you are facing.

Matt

jswon commented 4 years ago

Thank you Matt. @mattsinc As you comment, I lowered my Env. to cuda 8.0. so "import torch" is normally executed.

However, I still have problems running the test source. (python3)

for example,

import torch               <-  clear! Thank to you

a = torch.randn(2, 2)
b = torch.randn(2, 2)

a = a.cuda()                  <- Error
b= b.cuda()

c = torch.matmul(a, b)

Assert error occurs when executing the above example.

~/lib/python3.5/site-packages/torch/cuda/init.py:116: UserWarning: Found GPU0 GPGPU-Sim_vGPGPU-Sim Simulator Version 4.0.0 which is of cuda capability 2.0. PyTorch no longer supports this GPU because it is too old. warnings.warn(old_gpu_warn % (d, name, major, capability[1])) GPGPU-Sim: synchronize waiting for inactive GPU simulation GPGPU-Sim API: Stream Manager State GPGPU-Sim: detected inactive GPU simulation thread GPGPU-Sim: synchronize waiting for inactive GPU simulation GPGPU-Sim API: Stream Manager State GPGPU-Sim: detected inactive GPU simulation thread GPGPU-Sim: synchronize waiting for inactive GPU simulation GPGPU-Sim API: Stream Manager State GPGPU-Sim: detected inactive GPU simulation thread python: gpu-sim.cc:1783: void gpgpu_sim::perf_memcpy_to_gpu(size_t, size_t): Assertion `dst_start_addr % 32 == 0' failed.

a.cuda() occurred this error.

Have you ever seen something like this error? How do test it?

Why am I getting a size error when allocating memory to gpgpu-sim?

Also, if size 2x2 tensor, asserted error. but if 1x2 size tensor, not.

cuda = torch.device('cuda')
a = torch.tensor([[1., 2.,], [3., 4.,]], device = cuda)    <- Error

but 

a = torch.tensor([1., 2.,], device = cuda)    <- Pass

Please comment me how to solve.

Thank you :)

mattsinc commented 4 years ago

As I mentioned in my previous post, this is beyond where I've tried running things. So I don't have a good answer for you and hopefully someone else can chime in. However, looking at the error message, it appears that the memcpy is unaligned -- gpu-sim.cc:1783 is assuming that the memcpy address will be aligned, but for some reason the address PyTorch is passing is not aligned. So the first thing to do would be to instrument that function and see what address is being passed in. For example, you could add a print that prints the address before this line: https://github.com/gpgpu-sim/gpgpu-sim_distribution/blob/dev/src/gpgpu-sim/gpu-sim.cc#L1783.

If I had to guess, my guess is that PyTorch is assuming 64-bit addresses and GPGPU-Sim is assuming 32-bit addresses or something like that. But this is purely a guess.

Matt