Questions about the baseline structure

Yijun88 commented 3 years ago

Hi Xucong,

Excellent work on the ETH XGaze! It really provides a diverse dataset of gaze estimation. I have a questions regarding the baseline structure. Why we don't compress the FC outputs through a tanh/sigmoid activation function to normalize the output a bit? Is there a intuition to use the original outputs?

Additionally, I suggest in the demo code a model.eval() could be added before running the forward pass. https://github.com/xucong-zhang/ETH-XGaze/blob/ca2d991b8dea2b244f75dbb899c84afd15ed745c/demo.py#L158-L164

Looking forward to your reply!

Best, Yijun

xucong-zhang commented 3 years ago

Hi Yijun,

Thank you for your interest in our work.

I did not use tanh/sigmoid to normalize the output gaze direction because I think it is not necessary. The gaze label of training data is constrained in a certain range, and the model trained on such training data would output inside the range. I personally don't think it would be helpful with tanh normalization but you can give it a try.

Thank you for the suggestion of adding eval mode. I just made such change to the file.

Best, Xucong

Yijun88 commented 3 years ago

Hi Xucong,

Thanks for the explanation of the network structure. Additionally, can you share more details about how the gaze vectors (pitch and yaw) are generated? Does it follow a similar pipeline similar to these:

1) Generate 3D key points of the participant's face 2) Connect the point-of-regard with the 3D eye-center to formulate the gaze vector 3) Normalize/Off-set the gaze vector using the head pose

Thanks in advance.

Best, Yijun

xucong-zhang commented 3 years ago

Hi YIjun,

You are right about the pipeline of data normalization. For step 2, we take the 3D face centre as the gaze origin since the input image is face patch. The face centre is defined as "mean( mean(4 eye corners), mean(two nose corners) )". We will release the code for data normalization pipeline soon.

Best, Xucong

lucaskyle commented 3 years ago

Hi Xucong,

Thanks for the explanation of the network structure. Additionally, can you share more details about how the gaze vectors (pitch and yaw) are generated? Does it follow a similar pipeline similar to these:

Generate 3D key points of the participant's face

Connect the point-of-regard with the 3D eye-center to formulate the gaze vector

Normalize/Off-set the gaze vector using the head pose

Thanks in advance.

Best, Yijun

兄弟　他都发了好几篇论文了　你还不知道他怎么normalize的data demo里面的用法不是写的很清楚了吗？

１．视线起始是基于６点的人脸中心　双眼角和鼻子角两点--->以前的论文是这么操作的２．３Ｄgaze向量肯定是拍摄时　有个３ｄ点给人盯着　人脸中点和此点的连线去做一个向量　向量再转换2D夹角　他们MP2gaze计算方法原理都有。３．为啥要　Normalize/Off-set the gaze vector using the head pose？　作者数据是多个摄像头角度下拍摄的　而且拍摄的对象也允许头部有旋转　也提供旋转相关的ＧＴ　怎么就不会用了？

论文不去看　数据内容不check 　我觉得你很不专业

还有网络输出结果　是一个angle [-pi/2,pi/2] 　做一下tanh和不做　你改一下不就知道了嘛　其次　为啥就不能normalize　tanh一下输入　这些你都得试试。

xucong-zhang commented 3 years ago

Hi Xucong, Thanks for the explanation of the network structure. Additionally, can you share more details about how the gaze vectors (pitch and yaw) are generated? Does it follow a similar pipeline similar to these:

Generate 3D key points of the participant's face

Connect the point-of-regard with the 3D eye-center to formulate the gaze vector

Normalize/Off-set the gaze vector using the head pose

Thanks in advance. Best, Yijun

兄弟　他都发了好几篇论文了　你还不知道他怎么normalize的data demo里面的用法不是写的很清楚了吗？

１．视线起始是基于６点的人脸中心　双眼角和鼻子角两点--->以前的论文是这么操作的２．３Ｄgaze向量肯定是拍摄时　有个３ｄ点给人盯着　人脸中点和此点的连线去做一个向量　向量再转换2D夹角　他们MP2gaze计算方法原理都有。３．为啥要　Normalize/Off-set the gaze vector using the head pose？　作者数据是多个摄像头角度下拍摄的　而且拍摄的对象也允许头部有旋转　也提供旋转相关的ＧＴ　怎么就不会用了？

论文不去看　数据内容不check 　我觉得你很不专业

还有网络输出结果　是一个angle [-pi/2,pi/2] 　做一下tanh和不做　你改一下不就知道了嘛　其次　为啥就不能normalize　tanh一下输入　这些你都得试试。

Hi,

Thank you very much for the comment. These are valuable feedback.

I personally think @Yijun88 should have asked these questions since he was confused. And he actually asked for the confirmation instead of why we did the data normalization. I am happy to answer any question related to our project. I hope we can keep a friendly environment in this Q&A part so that none will be afraid to ask question and post issue. Thank you.

Best, Xucong

JohnsenJiang commented 3 years ago

您好，我想请问下关于三维视点采集的问题，您论文中提及用投影仪产生凝视点，但是并未介绍怎么转换为3维点数据。通过实际尺寸测量获得？《Appearance-Based Gaze Estimation in the Wild》中通过标定显示屏幕可以和摄像机产生关联获得3维数据，本文看的不是很明确，谢谢解惑！

xucong-zhang commented 3 years ago

Hi, please open a new issue when asking a different question next time, and I will write in English so the public can understand.

To obtain the 3D gaze direction, we did a similar thing as we did in "Appearance-Based Gaze Estimation in the Wild". We did the camera-screen calibration so that we can convert any points in the 3D screen coordinate system to the 3D camera coordinate system.

nalibjchn commented 3 years ago

Hi Xucong, Thanks for the explanation of the network structure. Additionally, can you share more details about how the gaze vectors (pitch and yaw) are generated? Does it follow a similar pipeline similar to these:

Generate 3D key points of the participant's face

Connect the point-of-regard with the 3D eye-center to formulate the gaze vector

Normalize/Off-set the gaze vector using the head pose

Thanks in advance. Best, Yijun

兄弟　他都发了好几篇论文了　你还不知道他怎么normalize的data demo里面的用法不是写的很清楚了吗？１．视线起始是基于６点的人脸中心　双眼角和鼻子角两点--->以前的论文是这么操作的２．３Ｄgaze向量肯定是拍摄时　有个３ｄ点给人盯着　人脸中点和此点的连线去做一个向量　向量再转换2D夹角　他们MP2gaze计算方法原理都有。３．为啥要　Normalize/Off-set the gaze vector using the head pose？　作者数据是多个摄像头角度下拍摄的　而且拍摄的对象也允许头部有旋转　也提供旋转相关的ＧＴ　怎么就不会用了？论文不去看　数据内容不check 　我觉得你很不专业还有网络输出结果　是一个angle [-pi/2,pi/2] 　做一下tanh和不做　你改一下不就知道了嘛　其次　为啥就不能normalize　tanh一下输入　这些你都得试试。

Hi,

Thank you very much for the comment. These are valuable feedback.

I personally think @Yijun88 should have asked these questions since he was confused. And he actually asked for the confirmation instead of why we did the data normalization. I am happy to answer any question related to our project. I hope we can keep a friendly environment in this Q&A part so that none will be afraid to ask question and post issue. Thank you.

Best, Xucong

I am happy xucong zhang's reply, sometimes the more we read or try a lot, the more we are confused, and asked "stupid" questions, but when we get confirmation or positive feedback, we may suddenly understand totally. My supervisor always tells me there is no stupid question.

kevinsu628 commented 2 years ago

Hi Xucong, Thanks for the explanation of the network structure. Additionally, can you share more details about how the gaze vectors (pitch and yaw) are generated? Does it follow a similar pipeline similar to these:

Generate 3D key points of the participant's face

Connect the point-of-regard with the 3D eye-center to formulate the gaze vector

Normalize/Off-set the gaze vector using the head pose

Thanks in advance. Best, Yijun

兄弟　他都发了好几篇论文了　你还不知道他怎么normalize的data demo里面的用法不是写的很清楚了吗？

１．视线起始是基于６点的人脸中心　双眼角和鼻子角两点--->以前的论文是这么操作的２．３Ｄgaze向量肯定是拍摄时　有个３ｄ点给人盯着　人脸中点和此点的连线去做一个向量　向量再转换2D夹角　他们MP2gaze计算方法原理都有。３．为啥要　Normalize/Off-set the gaze vector using the head pose？　作者数据是多个摄像头角度下拍摄的　而且拍摄的对象也允许头部有旋转　也提供旋转相关的ＧＴ　怎么就不会用了？

论文不去看　数据内容不check 　我觉得你很不专业

还有网络输出结果　是一个angle [-pi/2,pi/2] 　做一下tanh和不做　你改一下不就知道了嘛　其次　为啥就不能normalize　tanh一下输入　这些你都得试试。

There is no need to be toxic man. Imo what he asked actually means a lot to whoever came to this issue with questions. I am sure this question and the confirmation from the authors have helped many people, including me. A Q&A board exists for speeding up our progress, not to judge one's professionalism.

xucong-zhang / ETH-XGaze

Questions about the baseline structure #1