The following procedure describes the training process for the RegNet based models of iCatcher+. This should probably be written in a morel official location and is kept here for documentation purposes only.

RegNet-Based Gaze Classifier Training & Evaluation Procedure

This document provides a step-by-step description of the training and evaluation procedure of a gaze classifier with a RegNet backbone using the Lookit dataset.

Step 1: Data Preprocessing

The model was trained on the Lookit dataset following pre-processing using the OpenCV-DNN face classifier. We downloaded the available Lookit dataset with manual annotations (265 videos, 124 public and 141 scientific). Then, we used reproduce/preprocess.py to process the dataset into infant faces and corresponding annotations using the OpenCV-DNN face classifier. Using the publicly-available data splits (column which.dataset), we split the dataset into training, validation, and testing subsets. The same pre-processing procedure was applied to all subsets.

Step 2: Fine-Tuning Pre-Trained RegNet Model

We propose replacing the default ResNet-18 backbone of the GazeCodingModel by a RegNetY-16GF model visual backbone pre-trained on ImageNetv2 and fine-tuning it on the Lookit training subset. We used the regnet_y_16gf Torchvision implementation and weights. Accordingly, we replaced the GazeCodingModel architecture to use pre-trained regnet_y_16gf model and replacing its last layer with a linear layer.

encoder_init = torchvision. models.regnet_y_16gf
encoder_weights = torchvision.models.RegNet_Y_16GF_Weights.IMAGENET1K_V2

encoder_img = encoder_init(weights=encoder_weights).to(self.args.device)
self.encoder_img = torch.nn.Sequential(
        *encoder_img_modules[:-1],
        torch.nn.Linear(3024, 256),
        torch.nn.Dropout(0.2),
)

We train the classifier using a batch size of 64, with all other parameters set to the default, on a single V100 GPU.

python3 train.py \
    regnet_train \
    datasets/processed/lookit_train \
    --batch_size 64 \
    --architecture icatcher+ \
    --gpu_id 0 \
    --log

At the end of the training procedure, we identify the checkpoint corresponding to the epoch with the highest accuracy on the validation subset and use it for inference.

icatcherplus / icatcher_plus

How was the RegNet-based gaze classifier trained? #64

RegNet-Based Gaze Classifier Training & Evaluation Procedure

Step 1: Data Preprocessing

Step 2: Fine-Tuning Pre-Trained RegNet Model