RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

BshoterJ commented 5 years ago

when i run train_entnet.py, i met a problem :

Traceback (most recent call last):
  File "train_entnet.py", line 43, in <module>
    [valid_memories, valid_queries, valid_query_lengths], valid_ent_inds)
  File "/home/jzw/NLP/kbqa/BAMnet/src/core/bamnet/entnet.py", line 96, in train
    train_loss += self.train_step(batch_xs, batch_ys) / num_batches
  File "/home/jzw/NLP/kbqa/BAMnet/src/core/bamnet/entnet.py", line 174, in train_step
    loss.backward()
  File "/home/jzw/python3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/jzw/python3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

i use torch 0.4.1, cuda 8, cudnn 7, how can i fix it? Thank you.

hugochan commented 5 years ago

@BshoterJ The code was tested on torch 0.4.1, cuda 9 and cudnn 7.2.1. I was wondering if you was able to run other scripts (e.g., train.py) with GPU successfully.

BshoterJ commented 5 years ago

I can run train.py successfully. Maybe the version of my cuda is too low.

hugochan / BAMnet

RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR #3