sacmehta / ESPNet

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
https://sacmehta.github.io/ESPNet/
MIT License
538 stars 111 forks source link

cuda runtime error(59) #71

Closed sonic311 closed 4 years ago

sonic311 commented 4 years ago

C:\Users\SH\AppData\Local\Continuum\anaconda3\envs\pt\python.exe C:/Users/SH/Desktop/ESPNet-master/ESPNet-master/train/main.py Total network parameters: 349449 Data statistics C:\Users\SH\AppData\Local\Continuum\anaconda3\envs\pt\lib\site-packages\torch\nn\modules\loss.py:210: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see https://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details. warnings.warn("NLLLoss2d has been deprecated. " [72.39231 82.908936 73.1584 ] [45.31922 46.152893 44.914833] [ 2.7983413 6.929455 3.8406818 9.943495 9.770988 9.51484 10.309816 9.943075 4.649341 9.557599 7.866922 9.531267 10.349637 6.6723423 10.260542 10.287853 10.289883 10.40546 10.138483 5.1316648] Learning rate: 0.0005 C:\Users\SH\AppData\Local\Continuum\anaconda3\envs\pt\lib\site-packages\torch\optim\lr_scheduler.py:82: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) THCudaCheck FAIL file=C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [576,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [704,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [192,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [448,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [320,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [960,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [832,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [504,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [632,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [888,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [248,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [376,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [1016,0,0] Assertion t >= 0 && t < n_classes failed. C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: block: [1,0,0], thread: [760,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "C:/Users/SH/Desktop/ESPNet-master/ESPNet-master/train/main.py", line 411, in trainValidateSegmentation(parser.parse_args()) File "C:/Users/SH/Desktop/ESPNet-master/ESPNet-master/train/main.py", line 336, in trainValidateSegmentation train(args, trainLoader_scale1, model, criteria, optimizer, epoch) File "C:/Users/SH/Desktop/ESPNet-master/ESPNet-master/train/main.py", line 104, in train loss.backward() File "C:\Users\SH\AppData\Local\Continuum\anaconda3\envs\pt\lib\site-packages\torch\tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\SH\AppData\Local\Continuum\anaconda3\envs\pt\lib\site-packages\torch\autograd__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuda runtime error (59) : device-side assert triggered at C:/w/1/s/tmp_conda_3.6_045031/conda/conda-bld/pytorch_1565412750030/work/aten/src\THC/generic/THCTensorMath.cu:26

Process finished with exit code 1

How can i solve it???? i did already labeling pre-processing in loadData.py

if 255 in unique_values: label_img[label_img == 255] = 19 unique_values = np.unique(label_img)

and in main.py

loss,data[0] to loss,data

sacmehta commented 4 years ago

Add similar line in transforms file too.

https://github.com/sacmehta/ESPNet/blob/afe71c38edaee3514ca44e0adcafdf36109bf437/train/Transforms.py#L134

Also, I recommend you to use EdgeNets which is fully compatible with PyTorch 1.0+ and provides setup and evaluation scripts. https://github.com/sacmehta/EdgeNets

sonic311 commented 4 years ago

Did you mean that i should do

def call(self, image, label): if 255 in unique_values: label_img[label_img == 255] = 19 unique_values = np.unique(label_img)

like this??

It's not worked... T.T i can't understand about similiar line in transform file