hyf015 / egocentric-gaze-prediction

Code for the paper "Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition"
62 stars 18 forks source link

Question about function extract_late in AT.py #25

Open mujn1461 opened 2 years ago

mujn1461 commented 2 years ago

Hi, thank you for creating this repo! I'm a little confused about the code of extract_late.

In equation 2 of your paper, you obtain the weights wt-1 from cropping and averaging the spatial latent representation Ft-1 at time t-1. This spatial latent representation seems to be feature_s from line 226, and the cropping is achieved in lines 235-241. Then lines 242-252 implements equations 3 and 4 depending on if the frame t-1 is a fixation or not. In equation 4 you weigh the new weights wt on the spatial latent representation at Ft at time t, which makes a lot of sense, but in the code feat = get_weighted(chn_weight, feature_s) new chn_weight wt are still used to weigh the same feature_s at time t-1. Maybe I missed something? Thanks in advance for your help!

hyf015 commented 1 year ago

Hi, sorry that I just saw this. It has been a long time since I wrote this code, and I cannot remember the details now. I believe I intended to write code corresponding to the equations in the paper. However, if you are confident about the understanding of this code snippet, it might be a bug. I would appreciate it if you can double-check, or possibly propose a fix to this part.