seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

Question about multi object #22

Open sjb961121 opened 4 years ago

sjb961121 commented 4 years ago

Hello, when I use STM to do VOS task, I find the Object Edge is good, however there are several colors in one object like this, should I add loss about the index of num_objects when trainning 00001

seoungwugoh commented 4 years ago

In our implementation, we did not suffer from the above-described issue. What losses did you used? In our case, we used only cross-entropy losses after soft-aggregation. It means that, when there are 3 objects, the result after aggregation is 4-channel (including BG) probability map. We compute CE loss on that probability map for each frame.

cernykisss commented 4 years ago

Yes!,you are right! We used dicelosses before and it led to such a bad result.. And now I'm trying to use ce as loss. And the codes are like that : criterion = nn.CrossEntropyLoss() loss += criterion(Es[:,:,t].clone(), Ms[:,:,t].float()) / train_batch_size If we do like above, the MS should not be onehot-code. But in your codes you convert them into onehot by using function called "ALL_TO_ONEHOT" in So how can we solve this problem? Could you offer me some suggestions? Waiting for your reply!

seoungwugoh commented 4 years ago

@cernykisss In our implementation, nn.CrossEntropyLoss() takes logit as prediction and index map as the ground-truth. And it automatically computes mean over batch dimension, so you may not need to divide the loss by batch size. But, I am not sure how CE loss operates in recent torch versions.

In our code: logit = model(Fs[:,:,n], key, value, num_objects) loss_CE = F.cross_entropy(logit, torch.argmax(Ms[:,:,n], dim=1))

cernykisss commented 4 years ago

thank you!😙