This work is official implementation of GAZEL framework which is published in PerCom 2021(GAZEL: Runtime Gaze Tracking for Smartphones) .
This work is heavily based on Google's Firebase ML kit Sample(2020, June Version)
https://github.com/googlesamples/mlkit/tree/master/android/vision-quickstart
inspired by: Eye Tracking for Everyone(2016 CVPR)
Collaborators:
oleeyoung520 Email: 2015147520@yonsei.ac.kr
Yeeun55 Email: joyce9559@naver.com
yeokyeong46 Email: yeokyeong46@gmail.com
This work is based on Galaxy Tab S6.
We trained model with data collected using MLKitGazeDataCollectingButton
We also provide a method to utilize our Tablet model to Smartphones by calibration.
Red dot represent "Raw Gaze Estimation Output"
Blue dot represent "Moving Averaged Output" (if Calibration is done, moving average on calibrated output, else on raw output)
Green dot represent "Calibrated Raw Output"
Trained with Galaxy Tab S6 data ,tested on Galaxy Tab S6
Trained with Galaxy Tab S6 data, tested and calibrated on Galaxy S9+
I mainly changed FaceDetectorProcessor.java, LivePreviewActivity.java and FaceGraphic.java
Also deleted most of the source code that is not needed
Added a guide to load and use custom TensorFlow Lite model which is used for Gaze Estimation
Model should be stored in asset folder. I recommend to create model with Keras, then converted it to the TFlite model.
You can check the output also on Logcat. TAG is "MOBED_GazePoint"
GAZEL uses Personalized model. This model is heavily dependent on my facial appearances (Wearing glasses & in Lab environment). So would not work well on other person.(Have already tested...)
So, I decided to exclude the .tflite model and provide training source codes and data collecting application for you to follow.
The Keras model training & TensorFlow Lite conversion Code is provided in MLKitGazeDataCollectingButton.
We used 5 points calibration with translation, and rescaling.
5 points are TopLeft, TopRight, BottomLeft, BottomRight, and Center
We also tried to provide SVR calibration. However, multi output SVR doesn't exist in android. So we are using 2 regressors(with android libsvm) for each x and y coordinate (The calibration experiments on the paper are conducted on the "server" with scikit-learn, not on the "smartphones"). This does not work as well as the linear calibration, so we recommend to use linear calibration.
If you want to use custom TFLite model with our GAZEL Framework. First check configuration options below(in FaceDetectorProcessor.java ). We provide Face bitmap, Left/Right Eye Grids, Face Grid. We used 1-channel bitmap for enhancing gaze estimation accuracy, but like other papers which use 3-channel RGB images as input, we provide 3-channel image mode. You can change the mode with THREE-CHANNEL flag. We also provide various options for you to test your model with various model inputs, so try to create gaze estimation model with various inputs!
private final boolean USE_EULER = true; // true: use euler x,y,z as input
private final boolean USE_FACE = false; // true: use face x,y,z as input
private final boolean USE_EYEGRID = false; // true: use eye_grid as input
private final boolean USE_FACEGRID = true; // true: use face_grid as input
private final boolean THREE_CHANNEL = false; // false for Black and White image, true for RGB image
private final boolean calibration_mode_SVR = false; // false for translation & rescale. true for SVR
private final boolean CORNER_CALIBRATION = false; // false for translation & rescale with center, true for only 4 corners
Above configuration flags are about switching modes, now below configuration values are specific values for initializing modes.
private final double SACCADE_THRESHOLD = 300; // distance for classifying FIXATION and SACCADE
private final int resolution = 64; // for eye and face
private final int grid_size = 50; // for eye_grids
private final int face_grid_size = 25; // for face_grid
private final int FPS = 30; // for calibration count
private final int SKIP_FRAME = 10; // for calibration count
private final int COST = 40; // for SVR
private final int GAMMA = 1; // for SVR
private final int QUEUE_SIZE = 20; // for moving average
private final float EYE_OPEN_PROB = 0.0f; //empirical value
In case you put your TFLite model in the "GAZEL/GazeTracker/app/src/main/assets/custom_models/eval/" directory, you must change the below line in LivePreviewActivity.java, change
InputStream inputStream = getAssets().open("custom_models/eval/[your_model_name]].tflite");
then follow the issues
TensorFlow Lite Conversion. Before you load your tflite model, you must check the input details to make sure input order is correct.
In case you are using python interpreter,
import tensorflow as tf
tflite = tf.lite.Interpreter(model_path="path/to/model.tflite")
tflite.get_input_details()
example output will be
[{'name': 'left_eye',
'index': 4,
'shape': array([ 1, 64, 64, 1], dtype=int32),
'dtype': numpy.float32,
'quantization': (0.0, 0)},
{'name': 'right_eye',
'index': 56,
'shape': array([ 1, 64, 64, 1], dtype=int32),
'dtype': numpy.float32,
'quantization': (0.0, 0)},
{'name': 'euler',
'index': 1,
'shape': array([1, 1, 1, 3], dtype=int32),
'dtype': numpy.float32,
'quantization': (0.0, 0)},
{'name': 'facepos',
'index': 3,
'shape': array([1, 1, 1, 2], dtype=int32),
'dtype': numpy.float32,
'quantization': (0.0, 0)},
{'name': 'face_grid',
'index': 2,
'shape': array([ 1, 25, 25, 1], dtype=int32),
'dtype': numpy.float32,
'quantization': (0.0, 0)}]
Then reorder your inputs in FaceDetectorProcessor.java
inputs = new float[][][][][]{left_4d, right_4d, euler, facepos, face_grid}; // make sure the order is correct
This work is based on Tablet devices. So if you want to use this framework on Smartphones, you need to follow some instructions.
private final boolean isCustomDevice = true;
//custom device
private final float customDeviceWidthPixel = 1440.0f;
private final float customDeviceWidthCm = 7.0f;
private final float customDeviceHeightPixel = 2960.0f;
private final float customDeviceHeightCm = 13.8f;
private final float customDeviceCameraXPos = 4.8f; // in cm | at Android coordinate system where use top left corner as (0,0)
private final float customDeviceCameraYPos = -0.3f; // in cm | at Android coordinate system where use top left corner as (0,0)
//original device
private final float originalDeviceWidthPixel = 1600.0f;
private final float originalDeviceWidthCm = 14.2f;
private final float originalDeviceHeightPixel = 2560.0f;
private final float originalDeviceHeightCm = 22.5f;
private final float originalDeviceCameraXPos = 7.1f; // in cm | at Android coordinate system where use top left corner as (0,0)
private final float originalDeviceCameraYPos = -0.5f; // in cm | at Android coordinate system where use top left corner as (0,0)
set the isCustomDevice flag to true, then change all of the customDevice[option] values
Collect data as much as you can before training your model. Recommend you to use different head position with different light conditions.
It is well known knowledge to use massive image dataset to train feature extraction layers then only train fully connected layers for targeted environment. I as a mobile system developer however wanted to try using mobile embedded sensors to improve gaze estimation accuracy. That's why I tried to build up new application using collected sensor outputs as well as front camera frames. (And that's why I didn't use the massive image dataset for training). Also I believe our work is the first open source smartphone gaze tracking framework. I hope this little proof of concept framework would help your research. Thank you.