Open yoohj0416 opened 8 months ago
We did tag classification with one image per video. We used validation images of BDD100K (they don't provide test labels) and compared with accuracy in the paper. DLA-34 represents results from the Berkeley Deep Drive team (not sure either they used validation set or test set).
City street Correct case: b1dac7f7-6b2e0382 Image Video, False case: b1db7e22-cfa74dc3 (classified to highway) Image Video
Tunnel Correct case: b869965a-fa59f431 Image Video, False case: b9b53753-91a5d5f8 (classified to city street) Image Video
Highway Correct case: b21ac8b3-9b9cb45a Image Video, False case: b22a4d9f-48b2e986 (classified to city street) Image Video
Residential Correct case: b5c30297-f0b3279b Image Video, False case: b5b71e8e-96a024a9 (classified to city street) Image Video
Parking lot Correct case: c24e5f86-035733db Image Video, False case: c20db6e7-6613883a (classified to gas station) Image Video
Gas station Correct case: bbce4e17-9ff136be Image Video, False case: bda1719b-476378b3 (classified to parking lot) Image Video
Just some quick notes for idea to utilize mutimodal data..
To-Do
This is a set of steps for tag classification only with image.
In the tag classification test with one image experiment, classification is performed using only one image per video. The image is selected by the Berkeley DeepDrive team and provided by BDD100K images dataset.
In the tag classification with multiple images experiment, multiple images are extracted from each video and their embedding vectors are calculated. The vectors are then combined (i.e. mean) to perform classification.
Updated To-Do