Does loss regularization always improve accuracy?

zdluffy commented 5 years ago

I trained resnet-50 on Cityscapes with one gpu for three times that only different on the strategy of weight decay. The other parameters are the same as in your example. When I apply L2-SP regularization, the precision on val_set is 69.93 mIoU. When I apply L2 regularization, the precision on val_set is 69.68 mIoU. When I apply No regularization, the precision on val_set is 72.15 mIoU. So, my question is does loss regularization really work? Why the highest accuracy occurs with no loss regularization? Or how can I change the hyper-parameters to improve the accuracy of L2-SP regularization?

holyseven commented 5 years ago

I did the experiments without regularization, using 2*8 batch size, random_rotate 0, image_size 816, iter 30K, getting 75.5 mIoU (ss); while L2 and L2-SP getting around 76.7 mIoU. (The performance for L2 and L2-SP is different from the results in the table. I might be wrong for the results in that table. I will correct them later. But it is not the issue here.)

I cannot answer your question with certainty. I guess, with one gpu (batch size 8?), you might train the model with the same iterations (30K?). Comparing with two gpus (batch size 8*2), and without comparing the side effect for batch normalization layers, reducing batch size is equal to an early stopping. Knowing that the regularization restrains the optimization for the interest of the real loss function, L2 or L2-SP training process may not converge very well because of the "hypothetically equal" early stopping.

On the other hand, (this is pretty sure,) the regularization can help to improve the SOTA performance.

For the hyper-parameters for L2-SP in Cityscapes, my experience is that weight_decay_rate_2 can be usually larger than or equal to weight_decay_rate. For example, what I set for L2-SP is --weight_decay_rate 0.0001 --weight_decay_rate2 0.001, or --weight_decay_rate 0.001 --weight_decay_rate2 0.001. Both of these settings can get good results.

zdluffy commented 5 years ago

Got it! Thank you very much!

root-sudip commented 5 years ago

I am getting nan , can You help me? and have used resNet50 as you shared the link.

holyseven commented 5 years ago

@root-sudip See if this link can help you (https://github.com/holyseven/PSPNet-TF-Reproduce/issues/19#issuecomment-458462816). Just for details, could you tell me about your GPU type, and the script of running the training?

Feel free to open a new issue if existing ones do not match your problem.

root-sudip commented 5 years ago

@holyseven I am using NVIDIA TESLA P6 GPU and Python3 to train the network. To get the segmentation mask, I have used your this code. It is working fine, after some iterations around 1k, I am getting nan loss.

Thanks @holyseven for reply.

holyseven commented 5 years ago

@root-sudip which database and which hyperparameters you were using? Could you give me something like this:

python ./run.py --network 'resnet_v1_50' --visible_gpus '0,1' --reader_method 'queue' --weight_decay_mode 0 --weight_decay_rate 0.0001 --weight_decay_rate2 0.0001 --database 'ADE' --subsets_for_training 'train' --batch_size 8 --train_image_size 480 --snapshot 10000 --train_max_iter 60000 --test_image_size 480 --fine_tune_filename './z_pretrained_weights/resnet_v1_50.ckpt'

root-sudip commented 5 years ago

@holyseven Sure, Here it is, I have used Cityscapes dataset to train the network.

python3 ./run.py --network 'resnet_v1_50' --visible_gpus '0' --reader_method 'queue' --weight_decay_mode 0 --weight_decay_rate 0.0001 --weight_decay_rate2 0.0001 --database 'Cityscapes' --subsets_for_training 'train' --batch_size 8 --train_image_size 480 --snapshot 10000 --train_max_iter 60000 --test_image_size 480 --fine_tune_filename './z_pretrained_weights/resnet_v1_50.ckpt

holyseven commented 5 years ago

@root-sudip I repeated your training process but didn't find the nan problem. Although the train_image_size and test_image_size should be set larger for Cityscapes, it won't be a problem for training. This is what I got:

2019-03-17 10:39:42.496338 79650] Step 1320, lr = 0.009802, wd_mode = 0, wd_rate = 0.000100, wd_rate_2 = 0.000100 
loss = 0.38199526, aux_loss = 0.16923133, weight_decay = 1.6719286, Select_1 = 0.29464543, 
    Estimated time left: 8.62 hours. 1320/60000

Try these:

Repeat the training to see if this could come again.
I guess you have already done this (createTrainIdInstanceImgs) but check the database.
Turn off random rotate and random scale.

root-sudip commented 5 years ago

Ok, I am doing the training again,

For the database, to be confirmed I am telling you here, Suppose there has a gt aachen_000000_000019_gtFine_polygons.json and with it I have three .png files(for each corresponding gt .json file) in database/cityscapes/gt/train/*/ as you suggest in your code. For a example : 1)aachen_000000_000019_gtFine_color.png 2)aachen_000000_000019_gtFine_instanceids.png 3)aachen_000000_000019_gtFine_labelids.png

these are files for aachen_000000_000019_gtFine_polygons.json. These all are in database/cityscapes/gt/train/*/. So, to take only color mask I have changed the line(Line number 36) in your code(database/reader.py) , labels_filename_proto = data_dir + '/gt/' + data_sub + '/*/*_color.png' (in database/reader.py). Here above line I have used to take only *_color.png images.

Am I right @holyseven ??

holyseven commented 5 years ago

It seems that you didn't do the createTrainIdLabelImgs thing. (This link is correct for image segmentation. The link in my previous comment is for instance segmentation.)

It will generate a fourth png file, ended with *labelTrainIds.png and that is for training.

Verify the generated files, and then change the L36 code in database/reader.py to

labels_filename_proto = data_dir + '/gt/' + data_sub + '/*/*_labelTrainIds.png'

root-sudip commented 5 years ago

Ok, can I mail you to verify generated *_labelTrainIds.png image? ( to your hotmail id)

holyseven commented 5 years ago

OK. Feel free to open a new issue if existing ones do not match your problem.

root-sudip commented 5 years ago

Ok, I am sending you a file, and then If I face same problem I will open a issue, @holyseven Thanks

holyseven / PSPNet-TF-Reproduce

Does loss regularization always improve accuracy? #20