Question about function extract_late in AT.py

hyf015 / egocentric-gaze-prediction

Code for the paper "Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition"

62 stars 18 forks source link

Hi, thank you for creating this repo! I'm a little confused about the code of extract_late.

In equation 2 of your paper, you obtain the weights w_t-1 from cropping and averaging the spatial latent representation F_t-1 at time t-1. This spatial latent representation seems to be feature_s from line 226, and the cropping is achieved in lines 235-241. Then lines 242-252 implements equations 3 and 4 depending on if the frame t-1 is a fixation or not. In equation 4 you weigh the new weights w_t on the spatial latent representation at F_t at time t, which makes a lot of sense, but in the code feat = get_weighted(chn_weight, feature_s) new chn_weight w_t are still used to weigh the same feature_s at time t-1. Maybe I missed something? Thanks in advance for your help!

hyf015 / egocentric-gaze-prediction

Question about function extract_late in AT.py #25