yabufarha / ms-tcn

Other
214 stars 58 forks source link

Excluding first frame in loss calculation for every stage? #17

Closed aayushARM closed 4 years ago

aayushARM commented 4 years ago

@yabufarha Thanks for your easy to understand code! I just have a question regarding this line in model.py:

loss += 0.15*torch.mean( torch.clamp( self.mse(F.log_softmax(p[:, :, 1:], dim=1), F.log_softmax(p.detach()[:, :, :-1], dim=1)), min=0, max=16 ) * mask[:, :, 1:] )

I noticed you're using p[:, :, 1:], p[:, :, :-1] and mask[:, :, 1:] instead of entire tensors. I am trying to train this model on a video level(not frame level), using EGTEA+ dataset where I only have one [2048, 1] feature vector for each video and 1 label per video. So if I do the above slicings, I end up with empty tensors(since doing 1: or :-1 removes the only available label). Although I see the loss decreasing when I use full tensor without slicing, I wanted to know, what's the significance of doing it this way?

yabufarha commented 4 years ago

Hi @aayushARM , Happy that you found the code helpful! This loss is a smoothing loss. It penalizes the differences between predictions of the current frame and the previous one. We mask the predictions because there is no frame before the first one to use it as a reference. If your output is of shape [2048, 1], then you cannot use this loss and the other term of the loss function (the cross entropy loss) would be enough.
I hope this would help.

aayushARM commented 4 years ago

Ah, I should've paid more attention to the loss function in paper. Thanks for letting me know, it makes more sense now! I'm able to train with a steady decline in loss value. Keep up the great work! :)