Hello, I am using your nice suite since a couple of days. Running on two RTX GPUs (24GB) and there is a problem with the ohem loss:
python3 train.py --dataset cityscapes --use_ohem --gpus 0,1 --batch_size 32 --num_worker 8
1.4 --> Pytorch version (same error for 1.1, though)
=====> input size:(512, 1024)
Namespace(batch_size=32, classes=19, cuda=True, dataset='cityscapes', gpus='0,1', input_size='512,1024', logFile='log.txt', lr=0.0005, lr_schedule='warmpoly', max_epochs=1000, model='ENet', num_cycles=1, num_wor
kers=8, optim='adam', poly_exp=0.9, random_mirror=True, random_scale=True, resume='', savedir='./checkpoint/', train_type='trainval', use_focal=False, use_label_smoothing=False, use_lovaszsoftmax=False, use_ohem
=True, warmup_factor=0.3333333333333333, warmup_iters=500)
=====> use gpu id: '0,1'
=====> set Global Seed: 1234
=====> building network
=====> computing network parameters and FLOPs
the number of parameters: 360422 ==> 0.36 M
find file: ./dataset/inform/cityscapes_inform.pkl
length of dataset: 3475
length of dataset: 500
=====> Dataset statistics
data['classWeights']: [ 1.4705521 9.505282 10.492059 10.492059 10.492059 10.492059
10.492059 10.492059 10.492059 10.492059 10.492059 10.492059
10.492059 10.492059 10.492059 10.492059 10.492059 10.492059
5.131664 ]
mean and std: [72.3924 82.90902 73.158325] [45.319206 46.15292 44.91484 ]
w/ class balance
torch.cuda.device_count()= 2
=====> beginning training
=====> the number of iterations per epoch: 108
Traceback (most recent call last):
File "train.py", line 398, in <module>
train_model(args)
File "train.py", line 215, in train_model
lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
File "train.py", line 327, in train
loss = criterion(output, labels)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/git/Dummy-Efficient-Segmentation-Networks/utils/losses/loss.py", line 192, in forward
prob = prob.masked_fill_(1 - valid_mask, 1) #
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 394, in __rsub__
return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with
No error occurs if I use focal loss (FocalLoss2d)...
Hello, I am using your nice suite since a couple of days. Running on two RTX GPUs (24GB) and there is a problem with the ohem loss:
python3 train.py --dataset cityscapes --use_ohem --gpus 0,1 --batch_size 32 --num_worker 8
No error occurs if I use focal loss (FocalLoss2d)...