pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
82.78k stars 22.3k forks source link

torch._C._cuda_init() RuntimeError: CUDA error: unknown error #21114

Closed guifaChild closed 5 years ago

guifaChild commented 5 years ago

I installed pytorch, and my cuda version is upto date. But when I run my command, I get the following error: My system:

Windows 10 NVIDIA GeForce GTX 960M Python 3.6(Anaconda) PyTorch 1.1.0 CUDA 10 `import torch import torch.nn as nn from data_util import config use_cuda = config.use_gpu and torch.cuda.is_available() def init_lstm_wt(lstm): for names in lstm._allweights: for name in names: if name.startswith('weight'): wt = getattr(lstm, name) wt.data.uniform_(-config.rand_unif_init_mag, config.rand_unif_initmag) elif name.startswith('bias'):

set forget bias to 1

            bias = getattr(lstm, name)
            n = bias.size(0)
            start, end = n // 4, n // 2
            bias.data.fill_(0.)
            bias.data[start:end].fill_(1.)

def init_wtnormal(wt): wt.data.normal(std=config.trunc_norm_init_std) class Encoder(nn.Module): def init(self): super(Encoder, self).init() self.embedding = nn.Embedding(config.vocab_size, config.emb_dim) init_wt_normal(self.embedding.weight) self.lstm = nn.LSTM(config.emb_dim, config.hidden_dim, num_layers=1, batch_first=True, bidirectional=True) init_lstm_wt(self.lstm) self.W_h = nn.Linear(config.hidden_dim 2, config.hidden_dim 2, bias=False)

class Decoder(nn.Module): def init(self): super(Decoder, self).init() self.embedding = nn.Embedding(config.vocab_size, config.emb_dim) init_wt_normal(self.embedding.weight) class Model(object): def init(self, model_file_path=None, is_eval=False): encoder = Encoder() decoder = Decoder() if use_cuda: encoder = encoder.cuda() decoder = decoder.cuda() model=Model() Traceback (most recent call last): File "G:/public_workspace/pointer_summarizer-pytorch-2.7/data_util/test4.py", line 41, in model=Model() File "G:/public_workspace/pointer_summarizer-pytorch-2.7/data_util/test4.py", line 39, in init encoder = encoder.cuda() File "D:\soft\anaconda\lib\site-packages\torch\nn\modules\module.py", line 266, in cuda return self._apply(lambda t: t.cuda(device)) File "D:\soft\anaconda\lib\site-packages\torch\nn\modules\module.py", line 194, in _apply module._apply(fn) File "D:\soft\anaconda\lib\site-packages\torch\nn\modules\module.py", line 200, in _apply param.data = fn(param.data) File "D:\soft\anaconda\lib\site-packages\torch\nn\modules\module.py", line 266, in return self._apply(lambda t: t.cuda(device)) File "D:\soft\anaconda\lib\site-packages\torch\cuda__init__.py", line 176, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA error: unknown error I annotated a part ,and the error disappeared import torch import torch.nn as nn from data_util import config use_cuda = config.use_gpu and torch.cuda.is_available() def init_lstm_wt(lstm): for names in lstm._allweights: for name in names: if name.startswith('weight'): wt = getattr(lstm, name) wt.data.uniform_(-config.rand_unif_init_mag, config.rand_unif_initmag) elif name.startswith('bias'):

set forget bias to 1

            bias = getattr(lstm, name)
            n = bias.size(0)
            start, end = n // 4, n // 2
            bias.data.fill_(0.)
            bias.data[start:end].fill_(1.)

def init_wtnormal(wt): wt.data.normal(std=config.trunc_norm_init_std) class Encoder(nn.Module): def init(self): super(Encoder, self).init() self.embedding = nn.Embedding(config.vocab_size, config.emb_dim) init_wt_normal(self.embedding.weight)

self.lstm = nn.LSTM(config.emb_dim, config.hidden_dim, num_layers=1, batch_first=True, bidirectional=True)

    # init_lstm_wt(self.lstm)
    # self.W_h = nn.Linear(config.hidden_dim * 2, config.hidden_dim * 2, bias=False)

class Decoder(nn.Module): def init(self): super(Decoder, self).init() self.embedding = nn.Embedding(config.vocab_size, config.emb_dim) init_wt_normal(self.embedding.weight) class Model(object): def init(self, model_file_path=None, is_eval=False): encoder = Encoder() decoder = Decoder() if use_cuda: encoder = encoder.cuda() decoder = decoder.cuda() model=Model()`

peterjc123 commented 5 years ago

Duplicate of https://github.com/pytorch/pytorch/issues/20990.

xjdeng commented 5 years ago

I also have a 960M with CUDA 10, Python 3.6, and Windows 10.

The easy solution is as follows:

In any module that you'll be using pytorch, make sure the first two lines are as follows:

import torch
torch.cuda.current_device()

For example, if you're using fastai's vision module, normally you can import it as follows (if this bug isn't present):

from fastai.vision import *

But now you'll need to do the following:

import torch
torch.cuda.current_device()
from fastai.vision import *
joshuacwnewton commented 5 years ago

@xjdeng

I am glad I came across your comment. I am working through fast.ai's course, and this solution worked for me as well.

lwyanne commented 5 years ago

@xjdeng Thanks! I came across this problem exactly when I was using fastai library. This perfectly solved my problem

JoeWood2019 commented 5 years ago

@xjdeng Big thank to you! It saved my day! And I just wonder why we have to setup the first two lines in this way to make it work. Do you have any ideas?

fcoclavero commented 5 years ago

@xjdeng thanks for the fix! How did you figure it out though?

xjdeng commented 5 years ago

I can't remember exactly.. though I might have pruned through several solutions proposed on Github and Stackoverflow until I stumbled on this one which worked.