Open j-alex-hanson opened 1 year ago
Hi Alex,
Thanks for your interest in the AI-TOD series. I think this observation is interesting, and we haven't noticed it yet. Given that the original version of this dataset has been used in the community for benchmarking algorithms for more than two years, it is hard to make adjustments to it and filter out overlapping parts if they do exist.
In this case, I recommend using v2 since the number of overlapping boxes takes up a small portion of the whole dataset. Besides, as long as the method was evaluated under the same experimental setting, I think this dataset is still a good candidate for benchmarking tiny object detection since the sota mAP is still extremely low. Moreover, you can filter out these overlapping boxes and get a high-quality dataset if you would like to.
Hi there!
I was hoping to use the AI-TOD dataset, but noticed that both v1 and v2 have contaminated the test set with trainval objects and parts of trainval images.
This is happening because the overlapping crops from each xView dataset image are split across the train, val, and test sets.
Additionally this inflates the total number of objects.
Here are the stats I’ve found:
V1: Total number of contaminating xview bboxes: 84704 Total number of unique xview bboxes: 308286 Total number of xview bboxes: 443943 Total number of bboxes: 700621
V2: Total number of contaminating xview bboxes: 44801 Total number of unique xview bboxes: 419747 Total number of xview bboxes: 475857 Total number of bboxes: 752746
Given this overlap on xView, I'm concerned there may be additional contamination in the non-xView images. But without access to the script creating the non-xView images crops, I'm not able to check this.
I sent an email 2 months ago regarding this but haven't heard back yet. Let me know if I’m mistaken!
Here's the script I created to measure these numbers: