Closed yilinliu610730 closed 1 year ago
cuda9.0确实好像没法支持3070以上的显卡。
loading annotations into memory... Done (t=0.01s) creating index... index created! loading annotations into memory... Done (t=0.05s) creating index... index created! loading annotations into memory... Done (t=0.07s) creating index... index created! THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=8 : invalid device function Traceback (most recent call last): File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 54, in main() File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 50, in main train(cfg, network) File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 25, in train trainer.train(epoch, train_loader, optimizer, recorder) File "/home/VANDERBILT/liuy99/Documents/snake/lib/train/trainers/trainer.py", line 38, in train output, loss, loss_stats, image_stats = self.network(batch) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], kwargs[0]) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call* result = self.forward(input, kwargs) File "lib/train/trainers/snake.py", line 19, in forward output = self.net(batch['inp'], batch) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "lib/networks/snake/ct_snake.py", line 54, in forward output, cnn_feature = self.dla(x) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "lib/networks/snake/dla.py", line 469, in forward x = self.base(x) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call* result = self.forward(input, kwargs) File "lib/networks/snake/dla.py", line 289, in forward x = self.base_layer(x) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call* result = self.forward(input, **kwargs) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward exponential_average_factor, self.eps) File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Process finished with exit code 1
参数: CUDA9.0 显卡NVIDIA RTX A5000 Pytorch 1.1.0
试过:
- pip3 install -U https://download.pytorch.org/whl/cu90/torch-1.1.0-cp37-cp37m-linux_x86_64.whl装torch
- 重启
- 减少batch_size,现在batch_size = 1
我看说是不是cuda9.0没法支持,3070以上显卡应该只支持cuda11以上版本?有这个说法吗? 之前试过CUDA11.4,dcn没法装,requirements.txt这些装的时候报错很多,所以换了deep snake官方版本CUDA9.0和Pytorch 1.1.0。 有什么建议吗?
cuda11.4可以装dcn,试试pytorch1.11.0+dcn_v2(pytorch_1.11.0)版本。
loading annotations into memory... Done (t=0.01s) creating index... index created! loading annotations into memory... Done (t=0.05s) creating index... index created! loading annotations into memory... Done (t=0.07s) creating index... index created! THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=8 : invalid device function Traceback (most recent call last): File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 54, in
main()
File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 50, in main
train(cfg, network)
File "/home/VANDERBILT/liuy99/Documents/snake/train_net.py", line 25, in train
trainer.train(epoch, train_loader, optimizer, recorder)
File "/home/VANDERBILT/liuy99/Documents/snake/lib/train/trainers/trainer.py", line 38, in train
output, loss, loss_stats, image_stats = self.network(batch)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "lib/train/trainers/snake.py", line 19, in forward
output = self.net(batch['inp'], batch)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "lib/networks/snake/ct_snake.py", line 54, in forward
output, cnn_feature = self.dla(x)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "lib/networks/snake/dla.py", line 469, in forward
x = self.base(x)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "lib/networks/snake/dla.py", line 289, in forward
x = self.base_layer(x)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, **kwargs)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/home/VANDERBILT/liuy99/anaconda3/envs/snake/lib/python3.7/site-packages/torch/nn/functional.py", line 1697, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Process finished with exit code 1
参数: CUDA9.0 显卡NVIDIA RTX A5000 Pytorch 1.1.0
试过:
我看说是不是cuda9.0没法支持,3070以上显卡应该只支持cuda11以上版本?有这个说法吗? 之前试过CUDA11.4,dcn没法装,requirements.txt这些装的时候报错很多,所以换了deep snake官方版本CUDA9.0和Pytorch 1.1.0。 有什么建议吗?