thunlp / SDLM-pytorch

Code accompanying EMNLP 2018 paper Language Modeling with Sparse Product of Sememe Experts
25 stars 5 forks source link

What is the recommended version of CUDA? #5

Closed GregoryZeng closed 5 years ago

GregoryZeng commented 5 years ago

I tried PyTorch 0.3.1 with CUDA 8.0 and the code crashes. Now I tried running with PyTorch 0.3.0 with CUDA 7.5 (as is provided in the PyTorch official website) and it seems to work. I am still wondering what is the real settings for the experiments conducted in the paper.

junyann commented 5 years ago

Hello! Thanks for your interest. The LM code has been tested on PyTorch 0.3.1 + CUDA 8.0 + cuDNN 7.0.5. Since you didn’t provide the error massage and your cuDNN version, I guess you may have installed PyTorch 0.3.1 with a mismatched cuDNN version. If this is the case, you will get an error message like RuntimeError: cuDNN version mismatch: PyTorch was compiled against xxxx but linked against xxxx. Then please run conda install cudnn=7.0.5 to install the matched cuDNN for PyTorch 0.3.1 + CUDA 8.0.

If that can't solve your problem, could you please provide the error message when the code crashes? Thank you!

GregoryZeng commented 5 years ago

Sadly still not working with your configuration:

Traceback (most recent call last):
  File "run_tied_lstm.py", line 186, in <module>
    model.cuda()
  File "/fs/snotra0/gregzeng/anaconda3/envs/SDLM/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/fs/snotra0/gregzeng/anaconda3/envs/SDLM/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/fs/snotra0/gregzeng/anaconda3/envs/SDLM/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in _apply
    self.flatten_parameters()
  File "/fs/snotra0/gregzeng/anaconda3/envs/SDLM/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 111, in flatten_parameters
    params = rnn.get_parameters(fn, handle, fn.weight_buf)
  File "/fs/snotra0/gregzeng/anaconda3/envs/SDLM/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py", line 165, in get_parameters
    assert filter_dim_a.prod() == filter_dim_a[0]
AssertionError
junyann commented 5 years ago

Sorry that I failed to reproduce your problem on my machine. Your error message seems to relate to a known bug of PyTorch 0.3.1 released from Anaconda default channel. You could refer to the discussion in https://github.com/pytorch/pytorch/issues/5667. I hope this will help.