when i run GLAS dataset,something has happended,how could i solve it?

wcoool commented 4 years ago

python3 train.py --train_dataset ./data/train --val_dataset ./data/val --direc ./output --batch_size 1 --epoch 400 --save_freq 10 --modelname "kiunet" --learning_rate 0.0001 Let's use 2 GPUs! Total_params: 291243 /home/sc1/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py:2479: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [927,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [415,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [799,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [287,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [671,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [159,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [543,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:103: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T , T , T , long , T , int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "train.py", line 230, in loss.backward() File "/home/sc1/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/sc1/miniconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

jeya-maria-jose commented 4 years ago

Hi @wcoool , "Assertion t >= 0 && t < n_classes failed" - this happens because the ground truth segmentation maps contain other pixels than 0 or 1 ( in case of binary segmentation ). So, when you use GLAS dataset, before calculating loss, you can include a couple of lines of code where you make all the pixels above 127 in the ground truth as 1 and the rest as 0. This should solve the error.

y_batch[y_batch>=127] = 1 y_batch[y_batch<127] = 0

You can also do this alternatively by running a separate code to make all the ground truth segmentation labels 0 or 1 separately initially and using the new folder of labels as the train and test path.

wcoool commented 4 years ago

I am very glad that you could help me solve the problem. In the paper ,you said that you set 85 images for training and 80 images for testing in GLAS dataset. However, It has validation data in this code, Could you tell my that the numbers of training images, validation images and test images which you set in GLAS dataset.

jeya-maria-jose commented 4 years ago

As the number of images is very less, only train and test datasets were used for the experiments. You can just give the test dataset directory in val_dataset variable.

wcoool commented 4 years ago

when i use GLAS dataset. the output of the prediction the array of the image is o ,how to solve it?

jeya-maria-jose commented 4 years ago

Can you please check if you have processed the data according to the Notes ( meaning background is 0 and gland segmentation is 1) because I just retrained it and seems to work for me.

jeya-maria-jose / KiU-Net-pytorch

when i run GLAS dataset,something has happended,how could i solve it? #4