Closed lyyiangang closed 3 years ago
Great question. So I don't immediately know how to do this but I bet @LCTyrell likely does. He's done a decent amount of work in this direction.
Edit, meant to link this: https://github.com/LCTyrell/Object-looked-at_Estimation
Hi,
The description of the model is here: https://docs.openvinotoolkit.org/latest/omz_models_intel_gaze_estimation_adas_0002_description_gaze_estimation_adas_0002.html
The NNet does not give you the pitch and yaw angles but : " The network outputs 3-D vector corresponding to the direction of a person’s gaze in a Cartesian coordinate system in which z-axis is directed from person’s eyes (mid-point between left and right eyes’ centers) to the camera center, y-axis is vertical, and x-axis is orthogonal to both z,y axes so that (x,y,z) constitute a right-handed coordinate system."
I hope this will help you ;-)
Thanks @LCTyrell !
@LCTyrell Thanks very much for your reply. I want to use this model to estimate what the subject is looking. E.g. for a driver in a car, I want to know what the driver is looking at, front windshield, rear-view mirror or or right window rear-view? I think if the gaze vector and these gaze target are mapped to the same coordinate system, I can check the intersection of gaze ray and these objects to figure this problem. Then the question is how we can convert the gaze vector to the real world / camera coordinate system ? Is there any paper for your gaze model? Thanks very much.
OpenVino Gaze Estimate is a custom model of Intel. There is no much more information than the link I have provided above. To make the conversion, you will have to recover your old (or not) courses of mathematics ;-). I'm pretty sure there is a lot of ressources for that on the web. (ex: https://math.stackexchange.com/questions/1352632/change-from-one-cartesian-co-ordinate-system-to-another-by-translation-and-rotat ) To check if the result is good, 3D visualization (such as the 3D pose estimation) may be necessary.
If you are not in a hurry, I will work on that and put it on Github ... (but it will take time).
Thanks much for your reply. In fact, I am studying gaze vector estimation for Driver Monitor System. And have done many works on it by now, I have tried many gaze methods/dataset, e.g. MPII gaze, UnityEyes gaze, and also try the opengaze code, but there are not very stable in real scenario, zig-zagging often appears, accuracy can not reach 4~5 deg in fact. Waiting for your more open source code.
@lyyiangang , the openvino model is given for a Mean absolute error of 6.95°, and is not very stable. There may be solutions : make annotations on 3D models for training, using the Iris MediaPipe model, ...
You can join us to discuss it on the #eye-saccades channel of the DepthAI Discord: https://discord.gg/HBNhaFc4
Good to know on the accuracy. IIRC, it was trained on only 50 people. So training on more will likely improve the accuracy. The hard part on this is I think getting accurate truth data.
hi , the gaze net output pitch and yaw angles. I think it's in head coordinate system, then, how can I convert it to the camera coordinate system?