Open lucasjinreal opened 6 years ago
Hi, @jinfagang .
You can set multi_scale_cross_entropy
loss function in config file.
loss:
name: 'multi_scale_cross_entropy'
And change 'exponent' tensor type to float and set the corresponding device (in ptsemseg/loss/loss.py#L36):
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if target.is_cuda else 'cpu')
@adam9500370 Hi, I finally able to train on icnet. However, after 10k more iterations, the mean iOU seems not right at all:
27it [00:02, 16.36it/s]WARN: resizing labels yielded fewer classes
500it [00:26, 29.90it/s]
Overall Acc: 0.4196301378853902
Mean Acc : 0.15644619030428067
FreqW Acc : 0.31476421378091346
Mean IoU : 0.09229576247351066
Iter [194050/300000] Loss: 816839.8750 Time/Image: 0.1126
Iter [194100/300000] Loss: 548733.3750 Time/Image: 0.1123
Iter [194150/300000] Loss: 898010.5625 Time/Image: 0.1130
Iter [194200/300000] Loss: 646011.3125 Time/Image: 0.1125
Iter [194250/300000] Loss: 968136.6250 Time/Image: 0.1122
Iter [194300/300000] Loss: 655537.1875 Time/Image: 0.1125
Iter [194350/300000] Loss: 673936.6250 Time/Image: 0.1127
Iter [194400/300000] Loss: 556652.3750 Time/Image: 0.1128
Iter [194450/300000] Loss: 751962.5000 Time/Image: 0.1116
Iter [194500/300000] Loss: 685939.0625 Time/Image: 0.1128
Iter [194550/300000] Loss: 653181.4375 Time/Image: 0.1128
Iter [194600/300000] Loss: 596467.0625 Time/Image: 0.1117
Iter [194650/300000] Loss: 947831.4375 Time/Image: 0.1131
Iter [194700/300000] Loss: 603308.4375 Time/Image: 0.1123
Iter [194750/300000] Loss: 470650.3438 Time/Image: 0.1125
Iter [194800/300000] Loss: 461287.7500 Time/Image: 0.1140
Iter [194850/300000] Loss: 803597.2500 Time/Image: 0.1140
Iter [194900/300000] Loss: 580953.6875 Time/Image: 0.1157
Iter [194950/300000] Loss: 472815.9375 Time/Image: 0.1151
Iter [195000/300000] Loss: 620432.0625 Time/Image: 0.1165
26it [00:02, 16.84it/s]WARN: resizing labels yielded fewer classes
500it [00:26, 18.86it/s]
Overall Acc: 0.43595608380925455
Mean Acc : 0.14414920306903656
FreqW Acc : 0.30780209512001516
Mean IoU : 0.09285922375025128
Iter [195050/300000] Loss: 584194.6875 Time/Image: 0.1131
Iter [195100/300000] Loss: 579036.9375 Time/Image: 0.1129
Iter [195150/300000] Loss: 761244.0000 Time/Image: 0.1124
Iter [195200/300000] Loss: 789020.6875 Time/Image: 0.1127
Iter [195250/300000] Loss: 497891.0312 Time/Image: 0.1132
Iter [195300/300000] Loss: 814943.5625 Time/Image: 0.1123
Iter [195350/300000] Loss: 719462.1250 Time/Image: 0.1126
Iter [195400/300000] Loss: 583933.4375 Time/Image: 0.1119
Iter [195450/300000] Loss: 510635.5000 Time/Image: 0.1145
Iter [195500/300000] Loss: 540089.3125 Time/Image: 0.1137
Iter [195550/300000] Loss: 678339.6875 Time/Image: 0.1141
Iter [195600/300000] Loss: 1116914.5000 Time/Image: 0.1133
Iter [195650/300000] Loss: 574083.0625 Time/Image: 0.1158
the loss is too big, and the mean IOU is totally wrong.......... Any idea about this?
Could you share your training settings (i.e., # of classes (dataset), optimizer, learning rate, image size, ...)?
@adam9500370 Of course.
model:
arch: icnet
data:
dataset: cityscapes
train_split: train
val_split: val
# icnet should be 32*n+1
img_rows: 513
img_cols: 1025
path: /media/jintain/sg/permanent/datasets/Cityscapes
training:
train_iters: 300000
batch_size: 1
val_interval: 1000
n_workers: 16
print_interval: 50
optimizer:
name: 'sgd'
lr: 1.0e-10
weight_decay: 0.0005
momentum: 0.99
loss:
name: 'multi_scale_cross_entropy'
size_average: False
lr_schedule:
# resume: fcn8s_pascal_best_model.pkl
resume: runs/icnet_cityscapes_best_model.pkl
nothing else change. Training on cityscapes and using the default cityscapes dataloader
Due to size_average: False
for loss calculation, you may get a very large loss value (summation of cross entropy loss for all the pixels of all the images in each batch).
I think you may need to set size_average: True
to calculate mean of loss value.
In addition, if you train the model from scratch, you may need to try the followings:
arch: icnetBN
to include BatchNorm (is_batchnorm: True
)You can also download the converted Caffe pretrained Cityscapes models here, and set img_norm=False
and version="pascal"
arguments in data_loader (due to data preprocessing of original Caffe implementation).
@adam9500370 Hi, I take your advise and retrain from scratch, but the mean IOU still not normal. Here is the log:
Iter [2800/300000] Loss: 1.7671 Time/Image: 0.1351
Iter [2850/300000] Loss: 1.8565 Time/Image: 0.1378
Iter [2900/300000] Loss: 1.8952 Time/Image: 0.1374
Iter [2950/300000] Loss: 1.7559 Time/Image: 0.1380
Iter [3000/300000] Loss: 1.7315 Time/Image: 0.1363
0it [00:00, ?it/s]WARN: resizing labels yielded fewer classes
63it [00:55, 3.46it/s]
Overall Acc: 0.7806173583871298
Mean Acc : 0.26045823400686646
FreqW Acc : 0.64662924844955
Mean IoU : 0.20318397657362453
Iter [3050/300000] Loss: 1.6093 Time/Image: 0.1298
Iter [3100/300000] Loss: 1.7549 Time/Image: 0.1368
Iter [3150/300000] Loss: 1.6235 Time/Image: 0.1380
Iter [3200/300000] Loss: 1.3351 Time/Image: 0.1375
Iter [3250/300000] Loss: 1.4034 Time/Image: 0.1393
Iter [3300/300000] Loss: 1.7972 Time/Image: 0.1369
WARN: resizing labels yielded fewer classes
Iter [3350/300000] Loss: 1.6406 Time/Image: 0.1366
Iter [3400/300000] Loss: 1.7513 Time/Image: 0.1395
WARN: resizing labels yielded fewer classes
Iter [3450/300000] Loss: 1.6573 Time/Image: 0.1381
Iter [3500/300000] Loss: 2.1634 Time/Image: 0.1379
Iter [3550/300000] Loss: 1.4725 Time/Image: 0.1357
Iter [3600/300000] Loss: 1.5244 Time/Image: 0.1386
Iter [3650/300000] Loss: 1.4610 Time/Image: 0.1374
Iter [3700/300000] Loss: 1.6305 Time/Image: 0.1372
Iter [3750/300000] Loss: 1.5950 Time/Image: 0.1387
Iter [3800/300000] Loss: 1.8183 Time/Image: 0.1326
Iter [3850/300000] Loss: 1.9768 Time/Image: 0.1387
Iter [3900/300000] Loss: 1.4756 Time/Image: 0.1380
WARN: resizing labels yielded fewer classes
Iter [3950/300000] Loss: 1.3690 Time/Image: 0.1374
Iter [4000/300000] Loss: 1.4399 Time/Image: 0.1379
0it [00:00, ?it/s]WARN: resizing labels yielded fewer classes
63it [00:55, 3.55it/s]
Overall Acc: 0.7558650777368152
Mean Acc : 0.2424623463158562
FreqW Acc : 0.620776533991615
Mean IoU : 0.18858147214744353
As you can see, after almost 4000 iterations, the mean IOU still 0.18, is that normal? Doesn't see any continue improvement..........
Due to high proportion of pixels for road
class in the Cityscapes dataset, you may need to do class balancing to set higher loss weights for the rare classes. (reference: https://github.com/Eromera/erfnet_pytorch/blob/09efaac1dc7829e3719552cbe1e63183368f916d/train/main.py#L88-L131)
In addition, due to ~3000 training samples in the Cityscapes dataset, you may need to do some augmentations.
Hi, @jinfagang . You can set
multi_scale_cross_entropy
loss function in config file.loss: name: 'multi_scale_cross_entropy'
And change 'exponent' tensor type to float and set the corresponding device (in ptsemseg/loss/loss.py#L36):
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if input.is_cuda else 'cpu')
when i run pspnet,and modify the loss to: scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if input.is_cuda else 'cpu')
but error occured: AttributeError: 'tuple' object has no attribute 'is_cuda', i don't know how to solve it?
Replace
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if input.is_cuda else 'cpu')
with
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if target.is_cuda else 'cpu')
to avoid handling different input type in different phase.
Replace
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if input.is_cuda else 'cpu')
with
scale_weight = torch.pow(scale * torch.ones(n_inp), torch.arange(n_inp).float()).to('cuda' if target.is_cuda else 'cpu')
to avoid handling different input type in different phase.
Thank you very much! but my result is unusual: Iter [450/300000] Loss: 0.5713 Time/Image: 2.4058 Iter [460/300000] Loss: 2.1904 Time/Image: 2.2330 Iter [470/300000] Loss: 3.7478 Time/Image: 2.2353 Iter [480/300000] Loss: 1.8667 Time/Image: 2.2329 Iter [490/300000] Loss: 2.2474 Time/Image: 2.2363 Iter [500/300000] Loss: 1.5397 Time/Image: 2.2435 725it [16:00, 1.31s/it] Iter 500 Loss on Val: 1.7601 Overall Acc: 0.735417399594 Mean Acc : 0.0471207022447 FreqW Acc : 0.550698099316 Mean IoU : 0.0352395812907 i set batch=2, lr=0.01, size_average: True and i use pascal voc +sbd datasets.
Due to high proportion of pixels for background
class in the Pascal VOC dataset, if you train the model from scratch, the model might tend to only learn background
class.
Therefore, you may need to do class balancing to set higher loss weights for the rare classes, or set ignore_index=0
in F.cross_entropy
to ignore background class before the model learned for all the other classes.
You can also download the converted Caffe pretrained weights here, and set img_norm=False
and version="pascal"
arguments in data_loader (due to data preprocessing of original Caffe implementation). Then use larger batch size and smaller learning rate to fine-tune the model on these datasets.
Due to high proportion of pixels for
background
class in the Pascal VOC dataset, if you train the model from scratch, the model might tend to only learnbackground
class. Therefore, you may need to do class balancing to set higher loss weights for the rare classes, or setignore_index=0
inF.cross_entropy
to ignore background class before the model learned for all the other classes.You can also download the converted Caffe pretrained weights here, and set
img_norm=False
andversion="pascal"
arguments in data_loader (due to data preprocessing of original Caffe implementation). Then use larger batch size and smaller learning rate to fine-tune the model on these datasets.
Thank you very much!
@lfdeep Hi, I met the similar problem. I was wondering how you solved this. Thank you
My network doesn't seem to learn even after 10000 training iterations. the miou is still at 0.20.
Hi, icnet returned a tuple when training.... but when calculating loss, it directly get size from tuple and got this error: