remega / OMCNN_2CLSTM

The model of "DeepVS: A Deep Learning Based Video Saliency Prediction Approach" (ECCV2018
112 stars 36 forks source link

The output doesn't match with the input. #14

Closed stillbetter closed 4 years ago

stillbetter commented 4 years ago

I found the output saliency map is 16 frames less than the input one. If my input is 192 frames , then the output will be 176 frames. I look through the paper and didn't found any helpful information. So I want to ask is that true?

remega commented 4 years ago

Yeah, there are 2 reasons for that. 1) Our motion subnet needs to input a pair of frames with a certain gap to calculate the temporal information, So, the last few frames of the video will not be processed. 2) Our LSTM is designed with 16 cells, such that in our project the last frames (if less than 16) will also be neglected. You can modify the code to solve it or add some blank frames at the end of your video. More normally, we just used the saliency map of the last frame to compensate the missed frames.

stillbetter commented 4 years ago

Great!Thanks!