Tag Classification only with Image

yoohj0416 commented 8 months ago

To-Do

[x] Tag classification test with one image per video (This image is from BDD100K images dataset)
[ ] Tag classification test with multiple images per video (These images are extracted by our project)

This is a set of steps for tag classification only with image.

In the tag classification test with one image experiment, classification is performed using only one image per video. The image is selected by the Berkeley DeepDrive team and provided by BDD100K images dataset.

In the tag classification with multiple images experiment, multiple images are extracted from each video and their embedding vectors are calculated. The vectors are then combined (i.e. mean) to perform classification.

Updated To-Do

[x] Statistically analyze the classes that ImageBind classifies residential images (from here)

yoohj0416 commented 8 months ago

Tag classification with one image

We did tag classification with one image per video. We used validation images of BDD100K (they don't provide test labels) and compared with accuracy in the paper. DLA-34 represents results from the Berkeley Deep Drive team (not sure either they used validation set or test set).

Weather classification

Scene classification

The accuracy of weather classification is overall lower than DLA-34.
The accuracy of scene classification is higher or similar to DLA-34 for all categories except residential (13 / 1253).
We need to analyze why classification of residential image is extremely low.

yoohj0416 commented 7 months ago

Video & image example of scene classification

City street Correct case: b1dac7f7-6b2e0382 Image Video, False case: b1db7e22-cfa74dc3 (classified to highway) Image Video
Tunnel Correct case: b869965a-fa59f431 Image Video, False case: b9b53753-91a5d5f8 (classified to city street) Image Video
Highway Correct case: b21ac8b3-9b9cb45a Image Video, False case: b22a4d9f-48b2e986 (classified to city street) Image Video
Residential Correct case: b5c30297-f0b3279b Image Video, False case: b5b71e8e-96a024a9 (classified to city street) Image Video
Parking lot Correct case: c24e5f86-035733db Image Video, False case: c20db6e7-6613883a (classified to gas station) Image Video
Gas station Correct case: bbce4e17-9ff136be Image Video, False case: bda1719b-476378b3 (classified to parking lot) Image Video

Analyze what ImageBind classifies residential images to

yoohj0416 commented 7 months ago

Just some quick notes for idea to utilize mutimodal data..

Check if vehicles within the drivable area are moving in the same direction (image and masked image).
Mitigate ambiguity in scene understanding (image and text).

yoohj0416 / NLSearch