Closed Bio-shine closed 5 years ago
Could you please share your training log?
Could you please share your training log?
Arguments: Namespace(batch_size=4, cached_data_file='brats.p', channels=4, classes=4, data_dir='./data/original_brats18_preprocess/', data_dir_val='./data/original_brats17_preprocess/', inDepth=128, inHeight=128, inWidth=128, logFile='trainValLog.txt', lr=0.0005, max_epochs=500, model='ESPNet-3D', num_workers=1, onGPU=True, resume=False, resumeLoc='./results/checkpoint.pth.tar', savedir='./results/', scaleIn=1, step_loss=100, visualizeNet=False) Parameters: 3626584
Epoch Loss(Tr) Loss(val) mIOU (tr) mIOU (val
0 1.0938 1.0591 0.2385 0.2388 0.000500
1 0.7702 0.7594 0.2396 0.2396 0.000500
2 0.6773 0.8189 0.2393 0.2387 0.000500
3 0.4333 0.4404 0.2388 0.2396 0.000500
4 0.3834 0.4106 0.2387 0.2396 0.000500
5 0.3479 0.5181 0.2387 0.2396 0.000500
6 0.3316 0.6207 0.2388 0.2396 0.000500
7 0.3220 0.3328 0.2388 0.2396 0.000500
8 0.3181 0.3577 0.2388 0.2396 0.000500
9 0.3001 0.3220 0.2387 0.2396 0.000500
10 0.2971 0.4460 0.2388 0.2396 0.000500
11 0.2996 0.3119 0.2386 0.2396 0.000500
12 0.2902 0.3443 0.2388 0.2396 0.000500
13 0.2865 0.3079 0.2387 0.2396 0.000500
14 0.2877 0.3992 0.2387 0.2396 0.000500
15 0.2898 0.3528 0.2387 0.2396 0.000500
16 0.2792 0.3139 0.2388 0.2396 0.000500
17 0.2798 0.3032 0.2388 0.2396 0.000500
18 0.2794 0.3221 0.2388 0.2396 0.000500
19 0.2787 0.3993 0.2388 0.2396 0.000500Parameters: 3626584
acc_19.txt Epoch: 19 Overall Acc (Tr): 0.9551 Overall Acc (Val): 0.9586 mIOU (Tr): 0.2388 mIOU (Val): 0.2396 Per Class Training Acc: [0.98630136 0. 0. 0. ] Per Class Validation Acc: [0.99404764 0. 0. 0. ] Per Class Training mIOU: [0.9551215 0. 0. 0. ] Per Class Validation mIOU: [0.958583 0. 0. 0. ]
Thank you for your help
Seems like you have only background label. Please check your dataset and make sure you have correct labels.
Seems like you have only background label. Please check your dataset and make sure you have correct labels. I have checked the labels, nothing wrong showed up. Here comes another question, after how many epochs does the training process go to stable? Using a single GPU K80, it takes about 20 mins for an epoch.
I think we were able to see some results after few epochs, but we never witnessed what you are witnessing.
I still suspect there is something wrong with the labels. Could you please check the target tensor values inside the train function?
i followed the steps, and training outputs all zeros. i am trying to get a benchmark, so would you please give me some advice