Closed aayushARM closed 4 years ago
Hi @aayushARM ,
Happy that you found the code helpful!
This loss is a smoothing loss. It penalizes the differences between predictions of the current frame and the previous one. We mask the predictions because there is no frame before the first one to use it as a reference. If your output is of shape [2048, 1], then you cannot use this loss and the other term of the loss function (the cross entropy loss) would be enough.
I hope this would help.
Ah, I should've paid more attention to the loss function in paper. Thanks for letting me know, it makes more sense now! I'm able to train with a steady decline in loss value. Keep up the great work! :)
@yabufarha Thanks for your easy to understand code! I just have a question regarding this line in model.py:
loss += 0.15*torch.mean( torch.clamp( self.mse(F.log_softmax(p[:, :, 1:], dim=1), F.log_softmax(p.detach()[:, :, :-1], dim=1)), min=0, max=16 ) * mask[:, :, 1:] )
I noticed you're using p[:, :, 1:], p[:, :, :-1] and mask[:, :, 1:] instead of entire tensors. I am trying to train this model on a video level(not frame level), using EGTEA+ dataset where I only have one [2048, 1] feature vector for each video and 1 label per video. So if I do the above slicings, I end up with empty tensors(since doing 1: or :-1 removes the only available label). Although I see the loss decreasing when I use full tensor without slicing, I wanted to know, what's the significance of doing it this way?