lmnt-com / haste

Haste: a fast, simple, and open RNN library
Apache License 2.0
325 stars 27 forks source link

CUDA error: an illegal memory access was encountered #30

Closed jaak-s closed 3 years ago

jaak-s commented 3 years ago

When I run an RNN with the example (e.g., GRU, IndRNN) I get illegal memory access error.

import torch 
import haste_pytorch as haste 

x = torch.rand([25, 5, 128]).cuda() 

gru_layer = haste.GRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05) 
gru_layer.cuda()               
y, state = gru_layer(x)        
y.mean().backward()

Results in:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-66139d497ca7> in <module>
      7 gru_layer.cuda()
      8 y, state = gru_layer(x)
----> 9 y.mean().backward()

RuntimeError: CUDA error: an illegal memory access was encountered

I'm using Pytorch 1.7.1+cu110 and Python 3.7.3. Haste is from the github master, compiled by make haste_pytorch.

fjsj commented 3 years ago

Any solution for this? I'm experiencing the same issue on:

sharvil commented 3 years ago

The most common cause for this sort of problem is a mismatch between the CUDA version you're running with PyTorch and the CUDA version that Haste is built with on your system. What version of CUDA do you have in /usr/local/cuda?

fjsj commented 3 years ago

Thanks, I've installed cuda 11 on my Ubuntu now, but now I get "ImportError: libcublas.so.10: cannot open shared object file: No such file or directory".

Does that mean haste-pytorch only works with cuda 10?

sharvil commented 3 years ago

I think the issue here is a mismatch between CUDA and cuBLAS on your system – unrelated to Haste. Notice that whatever you're running is looking for a cuBLAS 10 library even though you installed CUDA 11.

sharvil commented 3 years ago

@fjsj, it seems that PyTorch brings in the wrong version of CUDA when building an extension with CppExtension instead of CUDAExtension. I think that's what you were seeing on your system. I've updated the build to use CUDAExtension. Hopefully it works for you – let me know if you're still running into issues with it.

fjsj commented 3 years ago

@sharvil thanks, is there a release for this fix? I couldn't find the setup.py for PyTorch version o Haste.