lizhe918 / EECE571L_2022WT2_ViT-DD

The project for UBC EECE571L 2022WT2
MIT License
1 stars 0 forks source link

Move the Frames to the Correct Directory Based on Distraction Info #3

Open lizhe918 opened 1 year ago

lizhe918 commented 1 year ago

ViT-DD predicts not only the emotion but also the distraction in driving. Luckily, in DMD, the distractions are already labeled in the JSON file with each video. These are ground-truth labels, and we need to format them properly so they can be fed into ViT-DD.

Firstly, you need to find the JSON file with each video and make sense of what it. You can find near the bottom of each JSON file the labels of driver actions, which are "distractions" for ViT-DD. The actions are not identical to the distractions in terms of the number of classes and the name choices. For details, please refer to https://github.com/Vicomtech/DMD-Driver-Monitoring-Dataset/wiki/DMD-distraction-related-action-annotation-criteria and the following list for ViT-DD:

In this task, we need to do the following things:

NOTE: the third and forth steps may happen in one iteration of the side body frames.

Please carefully document any edge cases you encounter and the solution you adopt.

lizhe918 commented 1 year ago

This issue may be worked together with issue #8.

Christina663 commented 1 year ago

Question: every single video folder (gas) has a corresponding JSON file for distraction info. Does every video has a corresponding driver_imgs_list.csv file? Or is there one single .csv file for all the videos from DMD dataset?

Christina663 commented 1 year ago

Question #2: how to deal with the frame offset when matching single frames with distraction labels?

lizhe918 commented 1 year ago

true_frame_number = (name_frame_number - 1) * 30 + 1

lizhe918 commented 1 year ago

If, for a particular frame, there are multiple distractions, such as having both "hair and makeup" (c8) and "talking to passengers" (c9), we should always take the smaller number, which, in this case, c8 as the final distraction classification. So, each frame only has one distraction associated with it in the final output.