openvinotoolkit / open_model_zoo

Pre-trained Deep Learning models and demos (high quality and extremely fast)
https://docs.openvino.ai/latest/model_zoo.html
Apache License 2.0
4.1k stars 1.37k forks source link

[Feature Request]: Gaze Estimation Demos questions #3856

Open adamswsk opened 1 year ago

adamswsk commented 1 year ago

Request Description

"Hello, I'm trying out the example on the official OpenVino website: Open Model Zoo Demos -> Gaze Estimation Demos. I have a few questions as follows and would like to ask your development team for assistance:

Could you describe the training dataset and the values of the feature points that determine the gaze for the 'gaze_estimation_adas_0002' model? I would like to use this model to calculate the position on a computer screen based on the generated gaze vector. Is there a ready-made demo for this? When testing the model, I found that the coordinate system of the gaze vector is that of the computer screen. It seems the screen needs to be vertical on the desk and parallel to the face. It's also mentioned that the camera should be at the same height as the eyes. I'm not sure if I understand this correctly."

Feature Use Case

gaze_estimation_adas_0002

Issue submission checklist

Wovchena commented 1 year ago

Hi. Sorry, I can't describe the training dataset.

You can refer to https://github.com/openvinotoolkit/open_model_zoo/blob/78201d4bec2d8ab3ec2a04d8a5b13f507a998d9c/demos/gaze_estimation_demo/cpp/src/utils.cpp#L20 to see how to project predicted values on a screen plane.

mishakin commented 1 year ago

Hi @adamswsk,

I was involved in the development of the model. But since it was more than 5 year ago, and I am not with OpenVino team anymore, I only have a vague recollection of what the dataset looked like.

Basically the setup was as follows. The participants were seated at some fixed distance in front of a rectangular board (like a blackboard in school) with a grid of points. The center of the board was at the same height as the eyes of the participants. The camera was attached to the center of the board, and directed at the center of participant's heads, so that optical axis of the camera intersected (roughly) the line between the eyes. The ground truth for the gaze vector was determined by computing the direction of a vector between participant's eyes and the point the participant was looking at. The head poses of the participants had some variety, but the gaze vector didn't depend on it: the gaze vector for some point in the board was the same regardless of the head pose.

Hope it helps.