lizhengwei1992 / Semantic_Human_Matting

Semantic Human Matting
527 stars 145 forks source link

train problem #16

Open cooodeKnight opened 5 years ago

cooodeKnight commented 5 years ago

there is a problem when i'm trying to train: here is the log: =============> Loading args Namespace(dataDir='./data/dataset', finetuning=False, load='human_matting', lr=0.001, lrDecay=100, lrdecayType='keep', nEpochs=100, nThreads=4, patch_size=320, saveDir='./ckpt', save_epoch=10, trainData='human_matting_data', trainList='./data/train_list.txt', train_batch=8, train_phase='pre_train_t_net', without_gpu=False) ============> Environment init ============> Building model ... ============> Loading datasets ... Dataset : file number 1700 ============> Set optimizer ... ============> Start Train ! ... /usr/local/lib/python3.5/dist-packages/torch/nn/functional.py:2539: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=26 error=59 : device-side assert triggered Traceback (most recent call last): File "train.py", line 261, in main() File "train.py", line 230, in main loss.backward() File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 107, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.5/dist-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26

what's wrong with it?

cooodeKnight commented 5 years ago

And in loss_function , i try to print the loss L_t :print(L_t.data) and the log is: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCReduceAll.cuh line=327 error=59 : device-side assert triggered Traceback (most recent call last): File "train.py", line 262, in main() File "train.py", line 229, in main alpha_gt) File "train.py", line 140, in loss_function print(L_t.data) File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 71, in repr return torch._tensor_str._str(self) File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 286, in _str tensor_str = _tensor_str(self, indent) File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 201, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 87, in init nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:327