Open younesmch opened 2 years ago
Hi, @younesmch
I think both the gaze vector and head pose are in camera coordinate system. So, I think we can get the gaze point on the screen with the following:
point_on_screen = face.center - face.gaze_vector * face.center[2] / face.gaze_vector[2]
As I moved my head position in front of a camera while staring at a fixed point, the computed gaze points on the screen seemed to be consistent to some extent, though they were not very accurate and there seemed to be a problem with the y-coordinate being lower than expected. I'm not sure what causes the problem with the y-coordinate. Maybe I've misunderstood something about the training data and there's a bug.
Hi , @hysts i calculate the gaze point which is the intersection between screen plan and gaze segment the gaze point i let it in camera coordinated system then i draw scatter plot for each X of gaze point in CCS and Y of gaze point in CCS my screen [](url )
what i get on plot (y data on red color) and units in centimetre on CCS
for X data the values it is acceptable between(-15,15) cm on CCS but Y data it is between (5,18) cm in fact the y data must be betwenn (0,20)
@hysts I tried the line of code you mentioned, but when my head position in 3d is fixed and i only change the orientation of my head while looking at the same point on my screen, the point_on_screen coordinates are changing depending on the orientation of my head.
So in the line
point_on_screen = face.center - face.gaze_vector * face.center[2] / face.gaze_vector[2]
the face.center
can remain the same while if I change the orientation of my head that would produce a different face.gaze_vector
and hence a different point on screen even though my focus point remained the same, but if i change from left-looking head orientation to right-looking orientation while gazing at the same point, the X coordinate of the gaze vector changes accordingly, as expected, however this results in a different point_on_screen
coordinate.
What is not clear to me is relative to what are the pitch and yaw computed? when are they 0? Do we have to do some kind of transformation to it opposite to the head rotation relative to the camera?
I can send you a video/example to showcase the problem if needed.
@tomilles the gaze vector is calculated on screen coordinated system and the units in métre the transformation it's depend on the location of the camera on screen for my case i do just translation
@younesmch so how do you get consistent/unchanged point_on_screen coordinates while gazing at the same point on screen but moving your head around? I tried many ways but I cannot seem to figure it out. Could you walk me through your steps or show me?
@tomilles for me i think the head pos it is injected during the training data as mentioned in the paper so the model can predict the correct point independently to head pos for me i just calculate the intersection between gaze segment and screen plan as i mention but the result not significant specially on y of gaze point
@younesmch @tomilles
I experimented to see how the predicted gaze point on the screen shifts depending on the head pose, and the following are the results:
I took 200 frames of videos with my head pose fixed and looking in the direction of the camera, and plotted the results. The plots in the first row are for moving the head position in the XYZ direction, and the plots in the second row are for rotating the head pose in the pitch, yaw, and roll directions. The axes of the graphs are flipped for visualization, and the units are centimeters here. Also, the distance from the camera is basically about 50cm, with "near" being about 25cm and "far" being about 100cm.
It seems that the predicted gaze vectors are off by about 20 degrees in the pitch direction, and that the predicted gaze point on the screen shifts into the X direction when the head rotates in the yaw direction.
I think something is wrong, but can't figure out what it is. This may take some time. Please let me know if you have any ideas.
@hysts i think the problem is with datasets which cover a limited area of gaze which make the model can't predict out of this area
Distributions of head angle (h) and gaze angle (g) in degrees for MPIIGaze,
@younesmch
I forgot to mention, but I used a model pretrained with the ETH-XGaze dataset, which covers much wider range of gaze and head direction, in the above experiment. The distribution bias in the dataset could be the cause, but I'm not sure at this point.
@hysts i can't see the problem the model predict the vertical distance totally non significant and in little range of gaze
@hysts I recreated pitch,yaw labels for MPIIFaceGaze dataset using MediaPipe based head pose estimator and your great developed tools and this script which I borrowed from official eth-xgaze github:
estimator.face_model_3d.estimate_head_pose(face, estimator.camera)
estimator.face_model_3d.compute_3d_pose(face)
estimator.face_model_3d.compute_face_eye_centers(face, 'ETH-XGaze')
estimator.head_pose_normalizer.normalize(im, face)
hR = face.head_pose_rot.as_matrix()
euler = face.head_pose_rot.as_euler('XYZ')
hRx = hR[:, 0]
forward = (face.center / face.distance).reshape(3)
down = np.cross(forward, hRx)
down /= np.linalg.norm(down)
right = np.cross(down, forward)
right /= np.linalg.norm(right)
R = np.c_[right, down, forward].T # rotation matrix R
gaze_point = np.array(line[24:27])
face_center = np.array(line[21:24])
gc = gaze_point - face.center*1000 #face_center
gc_normalized = np.dot(R, gc)
gc_normalized = gc_normalized / np.linalg.norm(gc_normalized)
gaze_theta = np.arcsin((-1) * gc_normalized[1])
gaze_phi = np.arctan2((-1) * gc_normalized[0], (-1) * gc_normalized[2])
gaze_norm_2d = np.asarray([gaze_theta, gaze_phi])
then finetuned your eth-xgaze_resnet18.pth
using this new labels and the shift value is decreased significantly. I uploaded this new model here (finetuned_eth-xgaze_resnet18.pth) and you can test this. So I think this problem came from the wrong normalization process (in label creation) or the wrong head pose estimation in original dataset.
Hi, @ffletcherr
Oh, that's wonderful! Thank you very much for the information. Sorry for not updating anything on this issue. I've been busy recently and haven't had time to do OSS thing.
I had thought that the discrepancy in the pitch direction could be due to differences in the 3D models, but I hadn't checked it myself. But looking at your results, it seems more likely that it was indeed the case.
By the way, it's just a small detail, but I'm not sure if the original normalization was "wrong". I think it's simply a difference in the 3D models used. I mean, the process of head pose estimation is like rotating a rigid face mask in 3D space to get the best fit based on facial landmarks, but if a different mask is used, the best fit pose could be different.
Anyway, I will check differences in the 3D models and the model you trained soon. And thank you again, it's really helpful in narrowing down the problem.
@hysts I recreated pitch,yaw labels for MPIIFaceGaze dataset using MediaPipe based head pose estimator and your great developed tools and this script which I borrowed from official eth-xgaze github:
estimator.face_model_3d.estimate_head_pose(face, estimator.camera) estimator.face_model_3d.compute_3d_pose(face) estimator.face_model_3d.compute_face_eye_centers(face, 'ETH-XGaze') estimator.head_pose_normalizer.normalize(im, face) hR = face.head_pose_rot.as_matrix() euler = face.head_pose_rot.as_euler('XYZ') hRx = hR[:, 0] forward = (face.center / face.distance).reshape(3) down = np.cross(forward, hRx) down /= np.linalg.norm(down) right = np.cross(down, forward) right /= np.linalg.norm(right) R = np.c_[right, down, forward].T # rotation matrix R gaze_point = np.array(line[24:27]) face_center = np.array(line[21:24]) gc = gaze_point - face.center*1000 #face_center gc_normalized = np.dot(R, gc) gc_normalized = gc_normalized / np.linalg.norm(gc_normalized) gaze_theta = np.arcsin((-1) * gc_normalized[1]) gaze_phi = np.arctan2((-1) * gc_normalized[0], (-1) * gc_normalized[2]) gaze_norm_2d = np.asarray([gaze_theta, gaze_phi])
then finetuned your
eth-xgaze_resnet18.pth
using this new labels and the shift value is decreased significantly. I uploaded this new model here (finetuned_eth-xgaze_resnet18.pth) and you can test this. So I think this problem came from the wrong normalization process (in label creation) or the wrong head pose estimation in original dataset.
hi @ffletcherr, Because of some reason can't access https://github.com/4-geeks/xgaze-js/releases/download/v0.0.1/finetuned_eth-xgaze_resnet18.pth Any ideas?
Any updates on code to resolve the on screen gaze location?
I would even be open to starting with something potentially even simpler, such as just determining if the face in the video "keeps their eyes on the camera" throughout the video.
hi thnks for great working again i just wanna get the on screen gaze point i calculate point of intersection between screen plane and gaze segment(as the source code written in https://git.hcics.simtech.uni-stuttgart.de/public-projects/opengaze/-/wikis/API-calls) i transfer the point of intersection to screen coordinate system but i get wrong result
can some one help in which coordinate system gaze vector ,eye point if someone done help thnks