Open lizhe918 opened 1 year ago
This issue may be worked together with issue #8.
Question: every single video folder (gas) has a corresponding JSON file for distraction info. Does every video has a corresponding driver_imgs_list.csv file? Or is there one single .csv file for all the videos from DMD dataset?
Question #2: how to deal with the frame offset when matching single frames with distraction labels?
true_frame_number = (name_frame_number - 1) * 30 + 1
If, for a particular frame, there are multiple distractions, such as having both "hair and makeup" (c8) and "talking to passengers" (c9), we should always take the smaller number, which, in this case, c8 as the final distraction classification. So, each frame only has one distraction associated with it in the final output.
ViT-DD predicts not only the emotion but also the distraction in driving. Luckily, in DMD, the distractions are already labeled in the JSON file with each video. These are ground-truth labels, and we need to format them properly so they can be fed into ViT-DD.
Firstly, you need to find the JSON file with each video and make sense of what it. You can find near the bottom of each JSON file the labels of driver actions, which are "distractions" for ViT-DD. The actions are not identical to the distractions in terms of the number of classes and the name choices. For details, please refer to https://github.com/Vicomtech/DMD-Driver-Monitoring-Dataset/wiki/DMD-distraction-related-action-annotation-criteria and the following list for ViT-DD:
In this task, we need to do the following things:
./datasets/DMD/SideBody/test
and./datasets/DMD/SideBody/train
../datasets/DMD/SideBody/train/cn
, wheren
ranges from 0 to 9, inclusive. In other words, there are going to be 10 new folders../datasets/DMD/SideBody
, move the frame into the correct./datasets/DMD/SideBody/train/cn
directory, based on the information in the JSON file, which specifies the distraction that happens in each frame../datasets/DMD/driver_imgs_list.csv
. This CSV file has three columns.subject
, whose value represents a driver. That is, for the side body frames of the same driver, thesubject
values of the frames should be the same. The value should look likep001
.classname
, which is the distractioncn
of the side body frame. That is, the values in this column are all in the format ofcn
, wheren
ranges from 0 - 9. This information must be consistent with the directory of the frame.img
, which is the name of the side body frame.NOTE: the third and forth steps may happen in one iteration of the side body frames.
Please carefully document any edge cases you encounter and the solution you adopt.