seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

Jaccard fluctuates seriously during training #18

Open haochenheheda opened 4 years ago

haochenheheda commented 4 years ago

Hi, thanks for sharing this great work. I have been working on the reproduction of STM for 2 months, and finally get a Jaccard of 77 on Davis-17-val. I found that during training (both in pre-train and finetune), the Jaccard on val set jitters seriously. For example, the J reaches 70 at 1000 iteration, but will quickly drop to 60 at 1100 iteration, and then rises back to 70 at 1200 iteration. The batch size is set to 4 and the optimizer is Adam with lr of 1e-5, which follows the setting proposed in the paper. I have tryed larger batch size and smaller lr, which didn't help. I'll apprecaite it if you could help me with this.

ryancll commented 4 years ago

@haochenheheda How did you do the backpropagation during the training (bp after all the frames are processed or bp through time)? How many samples you choose from each video? How did you calculate the loss (based on the final soft aggregated result or the output logit) ? Thank you!

haochenheheda commented 4 years ago

@ryancll Hi. 1. after all the frames. 2. for each iteration, I randomly choose three frames from a random video (online) 3. I have tried both,and it doesn't seem to make any difference.

ryancll commented 4 years ago

@haochenheheda Thank you! I met the same problem like you said, especially for fine-tuning.

seoungwugoh commented 4 years ago

Hi @ryancll @haochenheheda, Thanks for your interest in our work. I am sorry to hear that your reimplementation is not stable. The model usually fluctuates when the training data is not enough. We observed the similar problem when fine-tuning only using DAVIS. If you are not using YouTube-VOS for training, try to use them together. Else if you are using YouTube-VOS, try to apply more diverse and hard data augmentation. Try to make every training sample solvable (e.g. filter-out objects that does not appear (or too small) in the first frame). And gradually reducing LR also helped us to stabilize training. I am sorry that I am not able to open our code, but there are no special tricks for training.

ryancll commented 4 years ago

Hi @haochenheheda, Thank you for sharing! I have another question that did you optimize the loss function or over sample the videos with multiple objects to deal with the unbalanced number of objects in the data set?

seoungwugoh commented 4 years ago

Hi, @ryancll. Our model is not too sensitive to the number of objects as it combined at the last step. We simply iterates over videos in the dataset regardless the number of objects.

haochenheheda commented 4 years ago

@seoungwugoh ,thanks for your help! I'll try heavy augmentation.