Hello, I am trying to understand the data structure of eth-xgaze dataset.
In 'OnePersonDataset', three values are returned when it is called, which are image, pose, and gaze.
It seems that gaze is combination of pitch and yaw which are in radians. (please correct me if i am wrong)
I am little confused about what pose does during training. If the pose represents 'what direction the face is pointing at', then how can the pose be defined with one number (unlike pitch and yaw)?
Hello, I am trying to understand the data structure of eth-xgaze dataset. In 'OnePersonDataset', three values are returned when it is called, which are image, pose, and gaze. It seems that gaze is combination of pitch and yaw which are in radians. (please correct me if i am wrong) I am little confused about what pose does during training. If the pose represents 'what direction the face is pointing at', then how can the pose be defined with one number (unlike pitch and yaw)?