The gaze point coordinate problem

ukaukaaaa commented 1 year ago

Thank you for the great work!

I have two questions about the data.

From both "fixation.csv" and "gaze.csv", there are columns named "xmin_shown_from_image / ymin_shown_from_image / xmax_shown_from_image / ymax_shown_from_image". These data are used coupled with "x_position, y_position" in the released code to generate heat map. But from the paper, "x_position, y_position" is claimed to be the true coordinate of the gaze point in image space. So I was wondering if we want to match the coordinate of eye gaze point with the original image, do we also need to use "xmin_shown_from_image / ymin_shown_from_image / xmax_shown_from_image / ymax_shown_from_image"? Or we just need to trim the "x_position, y_position" into image size scope and take it as the coordinate on the x-ray image.
fixation.csv has contained those long-focused points in gaze.csv. I was wondering what did you do to generate fixation.csv from gaze.csv.

Thanks!

ricbl commented 1 year ago

Hi, Thank you for your interest in using our dataset! Here are the answer to the questions:

The shown parts of the image are used in the heatmap generation so that parts that were not shown during a specific fixation are not highlighted by the drawn Gaussian. This way some Gaussians have a support that does not cover the whole image, depending on the zooming that was used during a specific fixation. You do not need to use "xmin_shown_from_image / ymin_shown_from_image / xmax_shown_from_image / ymax_shown_from_image" to match the coordinate of eye gaze point with the original image. "x_position, y_position" are already pixel coordinates on the x-ray image.
The transformation from one type of data to the other was done by a proprietary algorithm that came with the EyeLink eye-tracking device. This sentence from the paper gives the arguments that were changeable for that algorithm: "Parsing was done in real time by the EyeLink 1000 Host PC, using a saccade velocity threshold of 35°/s, a saccade motion threshold of 0.2°, and a saccade acceleration threshold of 9,500°/s2. ". A bit more detail about the algorithm is given in section "4.3.1 Parser operation" in the eye-tracker manual: http://sr-research.jp/support/EyeLink%201000%20User%20Manual%201.5.0.pdf

Let me know if you have any other questions. This time, I should be able to answer on the same day as you ask.

ukaukaaaa commented 1 year ago

Hi, Thank you for your interest in using our dataset! Here are the answer to the questions:

The shown parts of the image are used in the heatmap generation so that parts that were not shown during a specific fixation are not highlighted by the drawn Gaussian. This way some Gaussians have a support that does not cover the whole image, depending on the zooming that was used during a specific fixation. You do not need to use "xmin_shown_from_image / ymin_shown_from_image / xmax_shown_from_image / ymax_shown_from_image" to match the coordinate of eye gaze point with the original image. "x_position, y_position" are already pixel coordinates on the x-ray image.

The transformation from one type of data to the other was done by a proprietary algorithm that came with the EyeLink eye-tracking device. This sentence from the paper gives the arguments that were changeable for that algorithm: "Parsing was done in real time by the EyeLink 1000 Host PC, using a saccade velocity threshold of 35°/s, a saccade motion threshold of 0.2°, and a saccade acceleration threshold of 9,500°/s2. ". A bit more detail about the algorithm is given in section "4.3.1 Parser operation" in the eye-tracker manual: http://sr-research.jp/support/EyeLink%201000%20User%20Manual%201.5.0.pdf

Let me know if you have any other questions. This time, I should be able to answer on the same day as you ask.

Thanks for the reply.

I start to do some learning based on this dataset. But I found that I can not get good detection performance. I was wondering if you have tried to learn this dataset and get some baseline results. If you have, would you share the result?

ricbl commented 1 year ago

ymin_shown_from_image

I used the gaze data to improve localization of a classification network. That paper can be found here: https://arxiv.org/abs/2207.09771

ricbl / eyetracking

The gaze point coordinate problem #3