Closed cmhungsteve closed 5 years ago
The mask tensor is needed if you have videos of variable lengths in your batch. It defines the valid outputs that are relevant for computing the loss and mask the non-relevant outputs that are generated because of padding. Nevertheless, at the default settings, the batch size is one which means the mask , in this case, is not needed because there is no padding needed. I hope that this would help.
Best, Yazan
Thank you for your explanation, but I am still not very sure what "padding" means here. It would be great if you can explain more. Thank you.
The input of the model is a tensor of size (bz, d, T), where bz is the batch size, d is the dimension of the features, and T is the length (number of frames) of the longest video in the batch. So if your batch size is greater than one, then you have to pad short videos with zeros to make sure that all videos in the batch have the same length T.
I see. Thank you for your detailed explanation. Just want to double-check. If I set the batch size larger than 1, the padding will happen and the mask will select the relevant outputs for evaluation. Is that correct?
Yes, that's correct ;)
Thank you so much. It's pretty clear to me now.
The input of the model is a tensor of size (bz, d, T), where bz is the batch size, d is the dimension of the features, and T is the length (number of frames) of the longest video in the batch. So if your batch size is greater than one, then you have to pad short videos with zeros to make sure that all videos in the batch have the same length T.
May I ask to be more specific about the variable length? Doesn't the model rely on the usage of Conv1d layers, which do not care about the length? Thank you in advance.
I wonder what "mask" in the following codes is used for.
It seems not to do anything and always be one.