Move the Frames to the Correct Directory Based on Distraction Info

lizhe918 commented 1 year ago

ViT-DD predicts not only the emotion but also the distraction in driving. Luckily, in DMD, the distractions are already labeled in the JSON file with each video. These are ground-truth labels, and we need to format them properly so they can be fed into ViT-DD.

Firstly, you need to find the JSON file with each video and make sense of what it. You can find near the bottom of each JSON file the labels of driver actions, which are "distractions" for ViT-DD. The actions are not identical to the distractions in terms of the number of classes and the name choices. For details, please refer to https://github.com/Vicomtech/DMD-Driver-Monitoring-Dataset/wiki/DMD-distraction-related-action-annotation-criteria and the following list for ViT-DD:

c0: safe driving
c1: texting - right
c2: talking on the phone - right
c3: texting - left
c4: talking on the phone - left
c5: operating the radio
c6: drinking
c7: reaching behind
c8: hair and makeup
c9: talking to passenger

In this task, we need to do the following things:

Create ./datasets/DMD/SideBody/test and ./datasets/DMD/SideBody/train.
Create ./datasets/DMD/SideBody/train/cn, where n ranges from 0 to 9, inclusive. In other words, there are going to be 10 new folders.
For each frame in ./datasets/DMD/SideBody, move the frame into the correct ./datasets/DMD/SideBody/train/cn directory, based on the information in the JSON file, which specifies the distraction that happens in each frame.
Create ./datasets/DMD/driver_imgs_list.csv. This CSV file has three columns.
- The first column is subject, whose value represents a driver. That is, for the side body frames of the same driver, the subject values of the frames should be the same. The value should look like p001.
- The second column is classname, which is the distraction cn of the side body frame. That is, the values in this column are all in the format of cn, where n ranges from 0 - 9. This information must be consistent with the directory of the frame.
- The third column is img, which is the name of the side body frame.
Modify the .gitignore file so that these images are not uploaded to GitHub.

NOTE: the third and forth steps may happen in one iteration of the side body frames.

Please carefully document any edge cases you encounter and the solution you adopt.

lizhe918 commented 1 year ago

This issue may be worked together with issue #8.

Christina663 commented 1 year ago

Question: every single video folder (gas) has a corresponding JSON file for distraction info. Does every video has a corresponding driver_imgs_list.csv file? Or is there one single .csv file for all the videos from DMD dataset?

Christina663 commented 1 year ago

Question #2: how to deal with the frame offset when matching single frames with distraction labels?

lizhe918 commented 1 year ago

true_frame_number = (name_frame_number - 1) * 30 + 1

lizhe918 commented 1 year ago

If, for a particular frame, there are multiple distractions, such as having both "hair and makeup" (c8) and "talking to passengers" (c9), we should always take the smaller number, which, in this case, c8 as the final distraction classification. So, each frame only has one distraction associated with it in the final output.

lizhe918 / EECE571L_2022WT2_ViT-DD

Move the Frames to the Correct Directory Based on Distraction Info #3