xucong-zhang / ETH-XGaze

Official implementation of ETH-XGaze dataset baseline
185 stars 33 forks source link

pitch and yaw (raw outputs of network) are not in HCS (head coordinates system) #20

Closed ffletcherr closed 1 year ago

ffletcherr commented 2 years ago

Hi, thanks for this great paper and dataset and also all of your previous valuable works in the field of appearance-based gaze estimation.

I recently tried to use the raw output of the network, which is trained on the ETH-XGaze dataset, to estimate the PoG (Point of Gaze) in CCS (Camera Coordinates System). So I used your normalization method and find the normalizing rotation matrix to transform the normalized gaze vector which is in HCS, to the 3D gaze vector which is in CCS.

But it seems that pitch and yaw are not in HCS because when everything is unchanging, except the camera position, the network output changes. So if it is correct and pitch and yaw are not in HCS, we need an extra step further than a normalizing rotation matrix which compensates head pose. But I can't find this step and it is ambiguous for me.

jasony93 commented 1 year ago

Shouldn't the gaze vector be different if the camera position is placed differently?

ffletcherr commented 1 year ago

You are right, that was my mistake. I thought the output of the network is [pitch, yaw] in Head Coordinates System, but it is in the normalized Camera Coordinates System.

So to get the gaze vector in the real CCS, denormalizing method can be used. In this step, we have face.center and face.gaze_vector (origin is face.center) in real CCS.

Finally, for calculating the Point of Gaze (in CCS), we must find the intersection of the face.gaze_vector and a plane with normal_vector = [0, 0, 1], which is the camera plane.