imfing / ava_downloader

:arrow_double_down: Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
393 stars 72 forks source link

Can't align image_id in AVA.txt and image file names #5

Closed yunxiaoshi closed 6 years ago

yunxiaoshi commented 6 years ago

It seems that I can't find any info on how the images and their annotations are aligned. For example, the first 5 rows in AVA.txt are these

1 953619 0 1 5 17 38 36 15 6 5 1 1 22 1396
2 953958 10 7 15 26 26 21 10 8 1 2 1 21 1396
3 954184 0 0 4 8 41 56 10 3 4 0 0 0 1396
4 954113 0 1 4 6 48 37 23 5 2 2 15 21 1396
5 953980 0 3 6 15 57 39 6 1 1 1 22 38 1396

within which the second column is the image_id, the 3rd-12th columns are annotations, and in the dataset the first 5 images are

1000.jpg
10000.jpg
10002.jpg
10003.jpg
10005.jpg

and there are 255,530 entries in AVA.txt and 255,510 files in images/, so I'm confused is there anything I missed here?

imfing commented 6 years ago

@kentsyx The images of the dataset are ranked according to their filename(i.e. number) in your file explorer. In your example, 1000<10000<10002.... Therefore, you cannot expect to see them aligned as the same as the original order in AVA.txt. Anyway, you can find the images in the dataset folder, say, 953619.jpg exists in the image directory.

The original paper did say there are 255,530 items. However, years passed and some of them are no longer available on the DPChallenge site. It's OK because 20 entries don't affect too much compared to the large quantity.

yunxiaoshi commented 6 years ago

@mtobeiyf I noticed the ranking issue you pointed out but at first glance their magnitude (say 953619 and 10000) seemed to be off a large margin, so I just assumed there is something wrong here. I will do an intersection of the respective ids to see if things pan out. Thank you for your reply!