seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

The details of calculating losses #23

Open LiuYuZzz opened 4 years ago

LiuYuZzz commented 4 years ago

For multi-obj, the input frame size is [batch_size, color channels, H, W], and the input objects mask size is [batch_size, num_objects + BG, H, W], and the questions are:

  1. When the STM module inputs the data whose batch size is greater than 1, it failed, so our work is based on batch_size =1;
  2. The network-output logit's size is [batch_size, num_objects + BG, H, W], then resize it to [batch_size*H*W, num_objects + BG], and input the new size tensor into the CrossEntropyLoss, is it right?
  3. Same as above, I calcutated loss after got the mulit-objct logit at every frame, didn't use softmax, because CrossEntropyLoss does softmax internally, , and then sum up losses from a sample of frames, and backward.
seoungwugoh commented 4 years ago

Hi @LiuYuZzz

  1. In our current implementation, it only works for 1 sample-per-GPU. We used batch size 4 with 4-GPU machine.
  2. You do not need to resize them just use torch.CrossEntropyLoss function (see the documentation for usage, they take indices for GT).
  3. Looks like you did it right.

For training, you do not need to make any changes inside our model. What you need to do is computing losses using training data and backward them.

LiuYuZzz commented 4 years ago

Hi @LiuYuZzz

  1. In our current implementation, it only works for 1 sample-per-GPU. We used batch size 4 with 4-GPU machine.
  2. You do not need to resize them just use torch.CrossEntropyLoss function (see the documentation for usage, they take indices for GT).
  3. Looks like you did it right.

For training, you do not need to make any changes inside our model. What you need to do is computing losses using training data and backward them.

thanks for your reply,I got it