The following procedure describes the training process for the RegNet based models of iCatcher+. This should probably be written in a morel official location and is kept here for documentation purposes only.
RegNet-Based Gaze Classifier Training & Evaluation Procedure
This document provides a step-by-step description of the training and evaluation procedure of a gaze classifier with a RegNet backbone using the Lookit dataset.
Step 1: Data Preprocessing
The model was trained on the Lookit dataset following pre-processing using the OpenCV-DNN face classifier. We downloaded the available Lookit dataset with manual annotations (265 videos, 124 public and 141 scientific). Then, we used reproduce/preprocess.py to process the dataset into infant faces and corresponding annotations using the OpenCV-DNN face classifier. Using the publicly-available data splits (column which.dataset), we split the dataset into training, validation, and testing subsets. The same pre-processing procedure was applied to all subsets.
Step 2: Fine-Tuning Pre-Trained RegNet Model
We propose replacing the default ResNet-18 backbone of the GazeCodingModel by a RegNetY-16GF model visual backbone pre-trained on ImageNetv2 and fine-tuning it on the Lookit training subset. We used the regnet_y_16gf Torchvision implementation and weights.
Accordingly, we replaced the GazeCodingModel architecture to use pre-trained regnet_y_16gf model and replacing its last layer with a linear layer.
At the end of the training procedure, we identify the checkpoint corresponding to the epoch with the highest accuracy on the validation subset and use it for inference.
The following procedure describes the training process for the RegNet based models of iCatcher+. This should probably be written in a morel official location and is kept here for documentation purposes only.
RegNet-Based Gaze Classifier Training & Evaluation Procedure
This document provides a step-by-step description of the training and evaluation procedure of a gaze classifier with a RegNet backbone using the Lookit dataset.
Step 1: Data Preprocessing
The model was trained on the Lookit dataset following pre-processing using the OpenCV-DNN face classifier. We downloaded the available Lookit dataset with manual annotations (265 videos, 124 public and 141 scientific). Then, we used
reproduce/preprocess.py
to process the dataset into infant faces and corresponding annotations using the OpenCV-DNN face classifier. Using the publicly-available data splits (columnwhich.dataset
), we split the dataset into training, validation, and testing subsets. The same pre-processing procedure was applied to all subsets. Step 2: Fine-Tuning Pre-Trained RegNet Model
We propose replacing the default ResNet-18 backbone of the
GazeCodingModel
by a RegNetY-16GF model visual backbone pre-trained on ImageNetv2 and fine-tuning it on the Lookit training subset. We used theregnet_y_16gf
Torchvision implementation and weights. Accordingly, we replaced theGazeCodingModel
architecture to use pre-trainedregnet_y_16gf
model and replacing its last layer with a linear layer. We train the classifier using a batch size of 64, with all other parameters set to the default, on a single V100 GPU.
At the end of the training procedure, we identify the checkpoint corresponding to the epoch with the highest accuracy on the validation subset and use it for inference.