quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.9k stars 845 forks source link

after 15minutes training stopped by RuntimeError: CUDNN_STATUS_NOT_INITIALIZED #77

Open maoqingsunny opened 5 years ago

maoqingsunny commented 5 years ago

Traceback (most recent call last): File "train_search.py", line 211, in main() File "train_search.py", line 136, in main train_acc, train_obj = train(train_queue, valid_queue, model, architect, criterion, optimizer, lr) File "train_search.py", line 164, in train architect.step(input, target, input_search, target_search, lr, optimizer, unrolled=args.unrolled) File "/home/szzhang/maoqing/darts-master/cnn/architect.py", line 34, in step self._backward_step_unrolled(input_train, target_train, input_valid, target_valid, eta, network_optimizer) File "/home/szzhang/maoqing/darts-master/cnn/architect.py", line 44, in _backward_step_unrolled unrolled_model = self._compute_unrolled_model(input_train, target_train, eta, network_optimizer) File "/home/szzhang/maoqing/darts-master/cnn/architect.py", line 21, in _compute_unrolled_model loss = self.model._loss(input, target) File "/home/szzhang/maoqing/darts-master/cnn/model_search.py", line 116, in _loss logits = self(input) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, kwargs) File "/home/szzhang/maoqing/darts-master/cnn/model_search.py", line 104, in forward s0 = s1 = self.stem(input) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, *kwargs) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(input, kwargs) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 277, in forward self.padding, self.dilation, self.groups) File "/home/szzhang/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: CUDNN_STATUS_NOT_INITIALIZED

alphadl commented 5 years ago

I improved the code to make it compatible with PyTorch 1.1 while allowing multi-GPU training on both RNN and CNN experiments.~ you can refer: https://github.com/alphadl/darts.pytorch1.1