Crop Faces and Create Pseudo Emotion Labels for the DMD frames.

lizhe918 commented 1 year ago

The frames extracted from the DMD videos do not have emotion labels. However, given that this project is machine learning for driver emotion detection, these frames of driving vehicles should have emotion labels associated with them for us to perform a learning task. So, we have to take an alternative approach of creating pseudo labels for these frames. That is, we will use a sophisticated emotion detection algorithm to create "fake" emotion labels for these frames. Specifically, we will use PAZ (https://github.com/oarriaga/paz) to create these labels.

PAZ is a mature computer vision package ready for use. It has detailed documentation available at https://github.com/oarriaga/paz. You may quickly go through the tutorial section of the documentation to learn how to use PAZ. For this task, we need to use the frames we saved in ./datasets/DMD/FrontBody/. Suppose the frame extraction task is successfully performed. In that case, these frames should be taken from the front direction of the driver, and the images also include a significant portion of the body.

Create a directory ./pseudo_emotion_label/DMD/imgs/ to save the cropped face images. Create a CSV file emo_list.csv to save the pseudo emotion labels. Then, for every frame in ./datasets/DMD/SideBody/, we will need to do the following:

Find the corresponding front frame in ./datasets/DMD/FrontBody/. NOTE: the front and side videos are not perfectly synced, so there may be an offset. You may find the offset information in the JSON file associated with each video.
Use PAZ to get the emotion label for the front frame.
Use PAZ to crop the face from the front frame (use RetinaFace instead if the performance of PAZ is bad).
Resize the cropped face to 224 * 224, which is the input image size of ViT-DD.
Save the image to ./pseudo_emotion_label/DMD/imgs/ using the name pattern ga1s1frontcropped_xxxxx.png, where xxxxx is the frame number.
Add a row to the emo_list.csv with the first column as the original side-view body frame path, and the second column as the emotion. NOTE: We may revisit this step to use numbers representing each emotion, but for now, let's just save the string.

There are some edge cases you should think about. For example, PAZ is not a perfect model, so it may regard a non-face part as a face or detect no face from a frame. Please make detailed annotations about this. Please make detailed documentation of how you dealt with these cases.

Christina663 commented 1 year ago

According to the ViT-DD paper, they used AffectNet-7 to train an emotion detector for pseudo labelling, in AffectNet, there are 8 different emotion including Neutral, Happy, Sad, Surprise, Fear, Anger, Disgust, Contempt (exclude None, etc.). Please refer to the AffectNet(AffectNet – Mohammad H. Mahoor, PhD (mohammadmahoor.com)), and also line 30 in the ViT-DD/affectnet.py at main · PurdueDigitalTwin/ViT-DD (github.com), they simply use range(8) for the labelling.

Christina663 commented 1 year ago

Emotions are 0 - 7 following the order in this image.

Christina663 commented 1 year ago

The main difference from last push is that, the directory storing the SideBody frames are modified. The frame images are no longer stored in one folder, but stored in n different folders according to their distractions. So I need to modify the code to take the image paths from their new SideBody directories, and also store the new SideBody directories.

lizhe918 / EECE571L_2022WT2_ViT-DD

Crop Faces and Create Pseudo Emotion Labels for the DMD frames. #2