lizhe918 / EECE571L_2022WT2_ViT-DD

The project for UBC EECE571L 2022WT2
MIT License
1 stars 0 forks source link

UBC EECE571L 2022WT2 Driver Emotion Detection

Quick Start for DMD Dataset

After receiving the preprocessed DMD dataset, you should notice the following directory structure:

| -- Preprocessed_DMD
  | ---- annotations
    | ------ DMD
      | -------- driver_imgs_list.csv
      | -------- train_list.csv
      | -------- val_list.csv
  | ---- datasets
    | ------ DMD
      | -------- imgs
      | -------- driver_imgs_list.csv
  | ---- pseudo_emo_label
    | ------ DMD
      | -------- imgs
      | -------- emo_list.csv
    | ------ imgs
    | ------ emo_list.csv

Please preserve the directory structure, and directly copy and paste them into their corresponding directory (i.e. copy everything in ./Preprocessed_DMD/annotations/ in the disk to ./annotations/ in your local repository cloned from this repo.

Then, go to the following links to download the pretrained checkpoints and place them into ./runs/vitdd/pretrained/.

Experiments Accuracy NLL Checkpoints
AUCDD 0.9359 0.2399 link
SFDDD split-by-driver 0.9251 0.3900 link
SFDDD split-by-image 0.9963 0.0171 link

Then, go to ./configs/vitdd_dmd.yaml in your local repository cloned from this repo. Adjust line 61, 62, 63 accordingly to select the appropriate model and checkpoint path.

To test the downloaded checkpoints, run the following command in the root of this repo.

python train.py test -c ./configs/vitdd_dmd.yaml

For how to train or retrain the checkpoint and test them, please carefully read the subsequent sections. I also suggest you to play with this repo first on the SFDDD dataset.

Overview

This is the repository for the team project of EECE571L. The team includes Zhe Li, Christina Sun, Charles Guan, and Jonny Wang. This project is based on the ViT-DD project as cited here:

@article{Ma2022MultiTaskVT,
  title={Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection},
  author={Yunsheng Ma and Ziran Wang},
  journal={arXiv},
  year={2022}
}

This projects will add new features and hopefully improve the performance of the model. This readme file intends to inform the users some key points about how to use the code in the repository.

Prerequisites

The code in this repository requires the following package:

You are advised to run the code in a virtual environment. The yaml file named with "ViT-DD_Env" may be used to faster the virtual environment construction.

Dataset

The original dataset used by ViT-DD were AUCDD and SFDDD. At this point (2023-03-20), the code is not adapted for AUCDD, and there may be errors when running this code on AUCDD. However, Zhe Li has debugged the original ViT-DD code and ensured that the code is ready for SFDDD. We have also ensured the running of the model on the new dataset DMD.

For SFDDD, please put the dataset in the ./datasets folder with the following directory structure unless you want to dive into the code by yourself.

datasets
|-- SFDDD 
    |-- imgs
        |-- train
          |-- c0
          |-- ...
          |-- c9
            |-- img_19.jpg
            |-- ...

Also, the original ViT-DD used psudo labels for emotion. The data of the psudo labels should be placed in the ./psudo_emo_label with the following file directory structure.

pseudo_label_path
|-- AUCDD
  |-- emo_list.csv
  |-- imgs
      |-- c0
      |-- ...
      |-- c9
          |-- 0_face.jpg
          |-- ...
|-- SFDDD
  |-- emo_list.csv
  |-- imgs
      |-- img_5_face.jpg
      |-- ...
|-- emo_list.csv
|-- imgs
    |-- img_5_face.jpg
    |-- ...

./pseudo_label_path/emo_list.csv and ./pseudo_label_path/imgs are both copies of the same files and folders in ./pseudo_label_path/SFDDD. This is a temporary measure and we expect to improve on this problem or provide detailed intructions for how to change dataset.

We preprocessed the DMD dataset by taking one frame per second from the videos in the DMD, and structured the dataset using the similar pattern.

Configurations

NOTE: Logically, this section should come before the next section. However, you may need to read this section first, then read the next section, and finally come back to this section again.

The code uses PyTorch-Lightning for the command line interface (CLI). To fit or test our deep learning model, we need to use a YAML file to provide configuration details to the Lightning CLI. The congifuration files can be in ./configs. The vitdd_aucdd.yaml is for running the code on the AUCDD dataset, the vitdd_sfddd_sbd.yaml is to run the code on SFDDD split-by-driver, and the vitdd_sfddd_sbi.yaml is to run the code on SFDDD split-by-image. Please use the correct configuration file for your specific purpose.

Looking into the configuration file, you should see lots of variables and values. You should pay attention to the following variables to avoid weird errors. We use vitdd_sfddd_sbd.yaml as an example.

The configuration file for running the code on the DMD dataset is located at ./configs/vitdd_dmd.yaml. This YAML file is similar to the one for SFDDD. See the commends in the file itself to use it for testing and training.

Training and Testing

To fit your model, you may use the following command:

python train.py fit -c ./configs/<config-yaml-file-name>.yaml

You may choose the path to your own cofiguration file. The output of running the fitting process is be a .ckpt file, which stores weights for the fitted model. You can find it following the settings you used in the congifuration file (see the previous section for details).

To test your model, you may use the following command:

python train.py test -c ./configs/<config-yaml-file-name>.yaml

Before executing this command, please make sure that the ckpt_path variable in the configuration file is not null (see the previous section for details). The output of the testing process is a png of confusion matrix. Defaulty, the path to the confusion matrix is ./runs/vis. If the diagonal of the confusion matrix have values close to 1, then the fitting is successful.

Extra Notes

This is a working repository. We will have frequent updates to the repo, so stay tuned. Also, the readme of the original ViT-DD repository is attached below for your convenience, but the information on it may not be correct for this repo.

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

paper

ViT-DD

Abstract

Driver distraction detection is an important computer vision problem that can play a crucial role in enhancing traffic safety and reducing traffic accidents. This paper proposes a novel semi-supervised method for detecting driver distractions based on Vision Transformer (ViT). Specifically, a multi-modal Vision Transformer (ViT-DD) is developed that makes use of inductive information contained in training signals of distraction detection as well as driver emotion recognition. Further, a self-learning algorithm is designed to include driver data without emotion labels into the multi-task training of ViT-DD. Extensive experiments conducted on the SFDDD and AUCDD datasets demonstrate that the proposed ViT-DD outperforms the best state-of-the-art approaches for driver distraction detection by 6.5% and 0.9%, respectively.

Results

Experiments Accuracy NLL Checkpoints
AUCDD 0.9359 0.2399 link
SFDDD split-by-driver 0.9251 0.3900 link
SFDDD split-by-image 0.9963 0.0171 link

Usage

Prerequisites

The code is built with following libraries:

To make sure that your PyTorch is using GPU. Please write and execute the following python file:

import torch
import torch
print(torch.cuda.is_available()) # Should print True
print(torch.cuda.device_count()) # see how many gpus are available to you. If you only have one gpu, should print 1.
print(torch.cuda.current_device()) # see which gpu you are using. If you only have one gpu, should print 0.

Data Preparation

Please organize the data using the directory structures listed below:

data_root
|-- AUCDD
    |-- v2
        |-- cam1
            |-- test
            |-- train
              |-- c0
              |-- ...
              |-- c9
                |-- 188.jpg
                |-- ...
|-- SFDDD 
    |-- imgs
        |-- train
          |-- c0
          |-- ...
          |-- c9
            |-- img_19.jpg
            |-- ...
pseudo_label_path
|-- AUCDD
  |-- emo_list.csv
  |-- imgs
      |-- c0
      |-- ...
      |-- c9
          |-- 0_face.jpg
          |-- ...
|-- SFDDD
  |-- emo_list.csv
  |-- imgs
      |-- img_5_face.jpg
      |-- ...

We provide our generated pseudo emotion labels as well as cropped images of drivers' faces for the AUCDD and SFDDD datasets here.

Citation

If you find ViT-DD beneficial or relevant to your research, please kindly recognize our efforts by citing our paper:

@article{Ma2022MultiTaskVT,
  title={Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection},
  author={Yunsheng Ma and Ziran Wang},
  journal={arXiv},
  year={2022}
}