orrzohar / FOMO

Official Pytorch code for Open World Object Detection in the Era of Foundation Models
Apache License 2.0
60 stars 4 forks source link

Questions about Dataset #12

Open leonnil opened 3 months ago

leonnil commented 3 months ago

Thanks for your great work!

I found in the Surgery dataset that there is a class, ‘Bipolar Forceps Down’, in the both known and unknown splits. Is that correct?

I noticed that you use ‘BipolarForcepsDown’ as known and ‘BipolarForcepsDOwn’ as unknown in the codebase to distinguish them. However, it seems ‘BipolarForcepsDOwn’ does not occur in the test images of the Surgery dataset, resulting in no ground truth for this class in the evaluation. Could you please tell me how to handle this?

Best, Leon

orrzohar commented 3 months ago

Hi Leon,

Thank you for highlighting this for us! Upon studying the source of this bug, we found that a miss-spelled label in the original NeuroSurgicalTools dataset lead to this bug. Specifically, some of the instances were labeled as BipolarForcepsDOwn and others as BipolarForcepsDown -- which led to the bug both in our codebase:

https://github.com/orrzohar/FOMO/blob/6df2dca8e9f051261dc7be89d5a8ece4f2642515/data/RWD/ImageSets/Surgical/unknown_classnames_ground_truth.txt#L5

https://github.com/orrzohar/FOMO/blob/6df2dca8e9f051261dc7be89d5a8ece4f2642515/data/RWD/ImageSets/Surgical/known_classnames.txt#L4

and the MS. I will fix this and update the MS accordingly (BipolarForcepsDOwn should not be in the unknown classnames). I will need to re-run FOMO on this benchmark as well.

Best, Orr

leonnil commented 3 months ago

Hi @orrzohar,

I noticed that in your few-shot data, some images appear in the test set (i.e., test.txt), while some are in neither the train nor test sets. For instance, in the Aerial dataset, the image 15750.jpg for the class ship is included in both few_shot_data.json and test.txt. Does this constitute a data leak since the model is trained using images from the test set?

Additionally, I observed that there are only 1,999 images in the Aerial training dataset in the codebase, whereas your paper reports a total of 5,000 images. Could this be a bug?

Best, Leon