Closed stillbetter closed 4 years ago
Yeah, there are 2 reasons for that. 1) Our motion subnet needs to input a pair of frames with a certain gap to calculate the temporal information, So, the last few frames of the video will not be processed. 2) Our LSTM is designed with 16 cells, such that in our project the last frames (if less than 16) will also be neglected. You can modify the code to solve it or add some blank frames at the end of your video. More normally, we just used the saliency map of the last frame to compensate the missed frames.
Great!Thanks!
I found the output saliency map is 16 frames less than the input one. If my input is 192 frames , then the output will be 176 frames. I look through the paper and didn't found any helpful information. So I want to ask is that true?