z-x-yang / CFBI

The official implementation of CFBI(+): Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration.
BSD 3-Clause "New" or "Revised" License
322 stars 43 forks source link

the influence of batchsize and learning rate for reproducing the DAVIS2017 training only `s result #12

Closed Cloveryww closed 4 years ago

Cloveryww commented 4 years ago

Hi: Thanks for your great work! I have some questions about reproducing the DAVIS2017 result. (1) I train the CFBI only use DAVIS2017 and evel it in DAVIS2017 valid. Due to the limitation of GPU memory, I first used 2 rtx2080Ti GPU, 1 batch per GPU, a total of 2batch, to reproduce the result, but it is about 4 points lower than that on paper, but it is still normal. However, when I use 4 rtx2080Ti GPUs with a total of 4 batches, the result is even lower. Even with the training, the loss and the performance on the valid set does not decrease, but rises after 10K iters. So I would like to ask how much the performance of CFBI is affected by batchsize. If training only on davis2017 dataset, how can the learning rate and batchsize be set to reproduce the results on paper? LR = 0.06, BS = 3 per Tesla V100 as the paper mentioned? Since the parameters of backbone have not changed during the training process, and GN is used in other BN layer of the model.

  1. How much effect does the current sequence's length have on the experimental results? DAVIS2017 only training use 3 or 4? Look forward to your reply! Thanks.
z-x-yang commented 4 years ago

(1) When training with only DAVIS2017, CFBI is very easy to overfit the training data. Under this situation, early stopping is straightforward to relieve the problem. According to our experiments, the best checkpoint should be near step 30000 under our default setting (LR = 0.06, total BS = 3 x 2 = 6, 50000 steps).

You can increase the number of TRAIN_MAX_KEEP_CKPT in the config file to keep more early checkpoints and evaluate them.

(2) On DAVIS, we propose to use a sequence's length of 4 or 5 (5 is slightly better).

Cloveryww commented 4 years ago

Thanks for your prompt reply! Another question: How much the batchsize`s infludence to the result based your experiments? Thanks

z-x-yang commented 4 years ago

No significant influence.

If you use a half BS, you need to use a half learning rate and double the training steps. Then, the final results should be similar.

Adjust BS and LR by one identical ratio, and the training steps by one reverse ratio.

BTW, I don't know when you get the training code, but the initial version has a bug in sequential training. The bug will make the sequential training useless.

Cloveryww commented 4 years ago

Thanks again for your great reply! I get the code at 8.19 。

z-x-yang commented 4 years ago

Emmm...

The bug was fixed at 8.24. You can send an email to me for a revised version, or you can modify the code as below. This bug will significantly drop the performance.

For fixing, add a line of all_pred = prev_labels.squeeze(1) before for idx in range(cfg.DATA_CURR_SEQ_LEN): in networks/engine/train_manager.py. Without this line, the previous frame mask of the first sequence step will come from the last prediction of the previous batch, instead of a correct ground-truth mask in the current batch.

Cloveryww commented 4 years ago

You have solved all my problems,Thanks!