hyf015 / egocentric-gaze-prediction

Code for the paper "Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition"
62 stars 18 forks source link

Is g_{t-1} found by grand truth during training?? #4

Closed kazucmpt closed 5 years ago

kazucmpt commented 5 years ago

I have a question about Channel weight extractor in Attention Transition Module. In Channel weight extractor, we need the coordinate of predicted gaze point in the previous frame g_{t-1}.

In your paper in Section 3.4, you wrote g{t-1} is the PREDICTED gaze point. But, in AT.py, you use the function "computeAAEAUC" to know g{t-1} and it seems that you find g_{t-1} by only sample['gt'] which means GRAND TRUTH.

So, my understanding is below: During training, you use the gaze point of GRAND TRUTH in the previous frame. During testing, you use the PREDICTED gaze point in the previous frame.

Is it correct?

hyf015 commented 5 years ago

Yes, when training AT, we use ground truth gaze position. When testing we use predicted gaze position.

kazucmpt commented 5 years ago

Thank you.

How did you decide the predicted gaze point of the first frame g_0 during testing?

hyf015 commented 5 years ago

We don't need g_0 when testing. g_0 is same with SP output.

kazucmpt commented 5 years ago

I am confused. In testing, in order to know g_1, we need g0. But we need F{-1}^s and F_{-1}^t to know g0. And we can not define F{-1}^s and F_{-1}^t. Do you mean that you define g0 as output of SP module because we do not have to know F{-1}^s and F_{-1}^t to know G_t^s?

hyf015 commented 5 years ago

Yes, that is actually what we expect.

kazucmpt commented 5 years ago

I understood well. Thank you.