Eye-movement event detection using random forest. Cite as:
@article{zemblys2018irf,
title={Using machine learning to detect events in eye-tracking data},
author={Zemblys, Raimondas and Niehorster, Diederick C and Komogortsev, Oleg and Holmqvist, Kenneth},
journal={Behavior research methods},
volume={50},
number={1},
pages={160--181},
year={2018},
}
Read also ./doc/IRF_replication_report.pdf
for more information on the dataset and changes made in the post-processing routine.
IRF was developed using Python 2.7 programming language and number of packages for data manipulation and training machine learning algorithms. This section describes how to prepare required software, how to use IRF algorithm and how to train your own classifier.
An easy way of preparing your python environment is to use Anaconda - an open source package management system and environment management system that runs on Windows, macOS and Linux. To install Anaconda follow the instructions provided in https://www.anaconda.com/download/, then open your terminal and type:
conda create --name irf python=2.7
source activate irf
The next step is to install all required python libraries. Run the following commands in your terminal window:
pip install tqdm
pip install parse
pip install numpy
pip install scipy
pip install pandas
pip install matplotlib
pip install astropy
pip install scikit-learn
Note that if you want to use a pretrained classifier that comes with this code, use:
pip install scikit-learn==0.17.1
To check if your environment is prepared correctly, run:
python run_irf.py --help
You should see the following output:
usage: run_irf.py [-h] [--ext EXT] [--output_dir OUTPUT_DIR]
[--workers WORKERS] [--save_csv]
clf root dataset
Eye-movement event detection using Random Forest.
positional arguments:
clf Classifier
root The path containing eye-movement data
dataset The directory containing experiment data
optional arguments:
-h, --help show this help message and exit
--ext EXT File type
--output_dir OUTPUT_DIR
The directory to save output
--workers WORKERS Number of workers to use
--save_csv Save output as csv file
This package includes a hand-labeled eye-movement dataset, called lookAtPoint_EL
(see in ./etdata/
). To parse this data using IRF, download a pretrained model, unzip and place it in ./models/
directory and run:
python run_irf.py irf_2018-03-26_20-46-41 etdata lookAtPoint_EL
You can also use custom --output_dir
parameter if you like, otherwise output folder will be set to ./etdata/lookAtPoint_EL_irf
.
!!! Note that pretrained models were trained using this exact dataset. Also note that only approximate saccade and blink events were manually coded for trial lookAtPoint_EL_S3
and it was not used for training or testing classifier.
After running the above command you will get messages like:
etdata/lookAtPoint_EL_irf/i2mc/lookAtPoint_EL_S1_i2mc.mat does not exist. Run i2mc extractor first!
etdata/lookAtPoint_EL_irf/i2mc/lookAtPoint_EL_S2_i2mc.mat does not exist. Run i2mc extractor first!
...
One of the features (i2mc) requires third party software. Running IRF for the first time converts data into the format, that is required for I2MC the algorithm. Open ./util_lib/I2MC-Dev/I2MC_rz.m
in MATLAB, edit folders.data
to point to your output directory and run the code. It will extract and save i2mc features. Note that I2MC code uses random initiations to calculate data clusters and therefore each time you recalculate i2mc feature, it will be slightly different. Therefore if you care about reproducing your classification, use the same already extracted i2mc data.
Now run python run_irf.py irf_2018-03-26_20-46-41 etdata lookAtPoint_EL
again. IRF will parse your data and save it as structured numpy arrays. It has also an option to save output in tab delimited text format: just add parameter --save_csv
when running IRF.
The internal data format used by IRF is a structured numpy array with a following format:
dtype = np.dtype([
('t', np.float64), #time in seconds
('x', np.float32), #horizontal gaze direction in degrees
('y', np.float32), #vertical gaze direction in degrees
('status', np.bool), #status flag. False means trackloss
('evt', np.uint8) #event label:
#0: Undefined
#1: Fixation
#2: Saccade
#3: Post-saccadic oscillation
#4: Smooth pursuit
#5: Blink
])
That means one first needs to convert the dataset to this format. Note that dataset folder needs to have db_config.json
file, that describes the geometry of the setup - physical screen dimensions in mm, eye distance in mm and screen resolution in pixels, for example:
"geom": {
"screen_width": 533.0,
"screen_height": 301.0,
"eye_distance": 565.0,
"display_width_pix": 1920.0,
"display_height_pix": 1080.0
}
Geometry also needs to be defined in ./util_lib/I2MC-Dev/I2MC_rz.m
. Note that dimensions here are in cm! After preparing your data run the IRF code in a similar way described above.
To train your own classifier place your training data into dataset/train
and your validation data into dataset/val
directories. Note that dataset
directory needs to contain db_config.json
file that describes the geometry of the setup. Training and validation data needs to be in the structured numpy array format described above.
You can use ./utils_lib/data_prep/augment.py
script to prepare lookAtPoint_EL
dataset for training the IRF. Just run the script and it will augment data by resampling it to various sampling rates and will add noise to it. Furthermore the script will split data into the training/validation and testing sets. Remember to copy db_config.json
to lookAtPoint_EL/training/
.
Note that augment.py
was developed using an older version of numpy, therefore you might need to replace you numpy instalation with version 1.11 by running:
pip install numpy==1.11
In config.json
you can adjust the training parameters:
{
"events": [1, 2, 3], #event labels to use; only fixation (1), saccade (2) and pso (3) are tested
"n_trees": 32, #number of trees to use
"extr_kwargs": { #feature extraction parameters
"w": 100, #context size for calculating features; in ms
"w_vel": 12,
"w_dir": 22,
"interp": false, #not used
"print_et": false #not used
},
"features": [ #features to use
"fs",
"disp",
"vel",
"acc",
"mean-diff",
"med-diff",
"rms",
"std",
"bcea",
"rms-diff",
"std-diff",
"bcea-diff",
"rayleightest",
"i2mc"
]
}
Now run:
python run_training.py etdata/lookAtPoint_EL training
This will perform feature extraction, train the IRF classifier and save it to the ./models/irf_datetime
directory. Note that the training script will stop if the i2mc
feature is used, in case of which you will need to run ./util_lib/I2MC-Dev/I2MC_rz.m
before actually training the classifier. After i2mc is extracted, rerun the training script one more time.