Question about dataset parse

CrossEntropy commented 3 months ago

Hi, professor Cheng @yihuacheng , Thank you for your work! I did some visualizations of labels while parsing normal dataset, and I found that some of the lines of sight were going in different directions than I expected. For example, I think the person is looking down, but the label says the person is looking up. I did a simple test using the pre-trained ResNet50(ETH-XGaze) and labels for comparison, I found that the ResNet50 was similar to my visual judgment, i.e. my judgment and the label diverged a bit.
Here is a picture of my experiment, the data comes from the Norm/20221013/subject0050_yaw_out, green represents the label and red represents the prediction, a red source indicates a suspicious output. Can you give me some useful information? thanks！ 20240522-133904

Shay2208 commented 2 months ago

Hi, professor Cheng @yihuacheng , Thank you for your work! I did some visualizations of labels while parsing normal dataset, and I found that some of the lines of sight were going in different directions than I expected. For example, I think the person is looking down, but the label says the person is looking up. I did a simple test using the pre-trained ResNet50(ETH-XGaze) and labels for comparison, I found that the ResNet50 was similar to my visual judgment, i.e. my judgment and the label diverged a bit. Here is a picture of my experiment, the data comes from the Norm/20221013/subject0050_yaw_out, green represents the label and red represents the prediction, a red source indicates a suspicious output. Can you give me some useful information? thanks！

Hi, I have encountered the same problem. I checked both the original and normalized data and visualized them. However, I found that neither of them matches the direction I expected. Have you figured out what caused this problem?

yihuacheng commented 2 months ago

Hi all, This is because of the coordinate system definition. Please use the following code to visualise it.

def gazeto3d(gaze):

    """
    Convert 2dgaze to 3dgaze (Yaw Pitch)
        : params gaze: np.array with shape [2]
        : return gaze: np.array with shape [3]
    """
    assert gaze.size == 2, "The size of gaze must be 2"

    gaze_gt = np.zeros([3])

    gaze_gt[0] = -np.cos(gaze[1]) * np.sin(gaze[0])
    gaze_gt[1] = -np.sin(gaze[1])
    gaze_gt[2] = -np.cos(gaze[1]) * np.cos(gaze[0])

    return gaze_gt

def gazeVisual(img, gaze, start = None, scale = 150,  color = (0, 0, 255), thickness=4, tiplength=0.2):
    if len(gaze) == 2:
        gaze = gazeto3d(gaze)

    x = -gaze[0] * scale
    y = gaze[1] * scale

    if start is None:
        start = (img.shape[1] //2, img.shape[0]//2)
    else:
        start = (int(start[0]), int(start[1]))

    end = (int(start[0] + x), int(start[1] + y))

    cv2.arrowedLine(img, start, end, color, thickness, 0, 0, tiplength)
    return img

CrossEntropy commented 2 months ago

Thank you Prof @yihuacheng Is it approarate if I use below interface to transform the 3D gaze to 2D?

def gazeto2d(gaze):
    yaw = -np.arctan2(-gaze[0], -gaze[2])
    pitch = -np.arcsin(-gaze[1])
    return np.array([yaw, pitch])

Regards

CrossEntropy commented 1 month ago

Hi, professor Cheng @yihuacheng , Thank you for your work! I did some visualizations of labels while parsing normal dataset, and I found that some of the lines of sight were going in different directions than I expected. For example, I think the person is looking down, but the label says the person is looking up. I did a simple test using the pre-trained ResNet50(ETH-XGaze) and labels for comparison, I found that the ResNet50 was similar to my visual judgment, i.e. my judgment and the label diverged a bit. Here is a picture of my experiment, the data comes from the Norm/20221013/subject0050_yaw_out, green represents the label and red represents the prediction, a red source indicates a suspicious output. Can you give me some useful information? thanks！

Hi, I have encountered the same problem. I checked both the original and normalized data and visualized them. However, I found that neither of them matches the direction I expected. Have you figured out what caused this problem?

Hi, did you solve your problem?

Shay2208 commented 1 month ago

No, I tried the given code but the issue persists.

CrossEntropy commented 1 month ago

No, I tried the given code but the issue persists.

Me too...

yihuacheng commented 1 month ago

Hi guys,

I previously misunderstood your question. Please ignore my earlier answer.

I have checked the data in terms of gaze target points. Although some images show that the driver is looking up, he is actually looking at the bottom of the windshield. You can imagine that the driver should be looking down in that situation.

Additionally, the images do not show the corresponding facial appearance due to the camera orientation setting. The common camera orientations are usually perpendicular to the face plane, while our camera captures the face from below. This difference affects the captured appearance and causes the computed gaze annotation to misalign with visual judgment.

In contrast, you used the pre-trained model on the ETH dataset and obtained reasonable predictions. This is because their basic camera orientation is perpendicular to the face plane.

This is also an adaptation issue in gaze estimation. Changes in camera pose can result in significant performance errors.

yihuacheng / IVGaze

Question about dataset parse #1