Problem when training with single GPU

SeanCho1996 commented 4 years ago

Hi there, I'm currently trying to train your network with DeeplabV3 model and pascal_voc dataset, and the problem is that I only have one GPU and it's not that powerful, so as mentioned in other issues, I have replace the SyncBatchNorm with normal BatchNorm, yet I have this problem with output size:

Traceback (most recent call last):
  File "E:\software\anaconda3\envs\tf_env\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "E:\software\anaconda3\envs\tf_env\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\work\jpu\FastFCN\experiments\segmentation\train.py", line 182, in <module>
    trainer.training(epoch)
  File "E:\work\jpu\FastFCN\experiments\segmentation\train.py", line 113, in training
    loss = self.criterion(outputs, target)
  File "E:\software\anaconda3\envs\tf_env\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\work\jpu\FastFCN\encoding\nn\customize.py", line 41, in forward
    pred1, pred2, target = tuple(inputs)
ValueError: not enough values to unpack (expected 3, got 2)

the script I ran is :

CUDA_VISIBLE_DEVICES=0
python -m experiments.segmentation.train ^
    --dataset pascal_voc ^
    --model deeplab ^
    --jpu --aux --aux-weight 0.4 ^
        --backbone resnet50 ^
    --batch-size 2 ^
    --checkname deeplab_res50_pascal

My pytorch version is 1.4.0.

Do yo have any idea how the problem can be solved? Thanks in advance!

meanmee commented 4 years ago

Just like the error said, there were not enough values to unpack since you didn't set the se-loss on

wuhuikai commented 4 years ago

You can use the latest branch.

tinaZZer commented 4 years ago

Hi, i also training with single GPU. I have replace SyncBatchNorm with BatchNorm2d, this problem occurred: ValueError:expected 4D input (got 3D input) so i use the BatchNorm1d, and it output: ValueError:expected 2D or 3D input (got 4D input) So the input should be no problem, this is a problem caused by the BN layer. Do you have any suggestions?

wuhuikai commented 4 years ago

@tinaZZer A simple solution is to change norm_layer(ncodes) here to nn.BatchNorm1d(ncodes).

tinaZZer commented 4 years ago

@wuhuikai I solved the problem, thanks for your solution!

wuhuikai / FastFCN

Problem when training with single GPU #63