icatcherplus / icatcher_plus

iCatcher+: Robust and automated annotation of infant gaze from videos collected in laboratory, field, and online studies
GNU General Public License v3.0
17 stars 19 forks source link

How was the RegNet-based gaze classifier trained? #64

Closed yoterel closed 1 year ago

yoterel commented 1 year ago

The following procedure describes the training process for the RegNet based models of iCatcher+. This should probably be written in a morel official location and is kept here for documentation purposes only.

RegNet-Based Gaze Classifier Training & Evaluation Procedure

​ This document provides a step-by-step description of the training and evaluation procedure of a gaze classifier with a RegNet backbone using the Lookit dataset. ​

Step 1: Data Preprocessing

The model was trained on the Lookit dataset following pre-processing using the OpenCV-DNN face classifier. We downloaded the available Lookit dataset with manual annotations (265 videos, 124 public and 141 scientific). Then, we used reproduce/preprocess.py to process the dataset into infant faces and corresponding annotations using the OpenCV-DNN face classifier. Using the publicly-available data splits (column which.dataset), we split the dataset into training, validation, and testing subsets. The same pre-processing procedure was applied to all subsets. ​

Step 2: Fine-Tuning Pre-Trained RegNet Model

We propose replacing the default ResNet-18 backbone of the GazeCodingModel by a RegNetY-16GF model visual backbone pre-trained on ImageNetv2 and fine-tuning it on the Lookit training subset. We used the regnet_y_16gf Torchvision implementation and weights. ​ Accordingly, we replaced the GazeCodingModel architecture to use pre-trained regnet_y_16gf model and replacing its last layer with a linear layer. ​

encoder_init = torchvision. models.regnet_y_16gf
encoder_weights = torchvision.models.RegNet_Y_16GF_Weights.IMAGENET1K_V2
​
encoder_img = encoder_init(weights=encoder_weights).to(self.args.device)
self.encoder_img = torch.nn.Sequential(
        *encoder_img_modules[:-1],
        torch.nn.Linear(3024, 256),
        torch.nn.Dropout(0.2),
)

​ We train the classifier using a batch size of 64, with all other parameters set to the default, on a single V100 GPU.

python3 train.py \
    regnet_train \
    datasets/processed/lookit_train \
    --batch_size 64 \
    --architecture icatcher+ \
    --gpu_id 0 \
    --log

​ At the end of the training procedure, we identify the checkpoint corresponding to the epoch with the highest accuracy on the validation subset and use it for inference.