Open abhudev opened 6 years ago
http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf has more details. please follow their methodology -- its object detction and localization for CUB in this case..
Thank you sir! Is the task the same in SVHN dataset? Because here there are multiple digits that need to be localized in each image, which is different from CUB, where there is only one object to be localized. Since both CUB and SVHN are in the same table, I thought the two tasks should be the same. Also, how do we combine the attention output of conv4 and conv5? The paper describes 2-3 ways - concatening the outputs, adding them etc.
Dear Sir, I have read the Section A.1 "Datasets", in which the authors say that "For CUB-200-2011, the images are cropped using ground-truth bounding box annotations and resized". So this means that they are in fact doing image recognition, and not object detection/localization. It is still not clear in the paper what they do for SVHN, but since SVHN also has a part in which there are cropped images of digits, it means that there also they are doing image recognition, not object localization/detection. So I think weakly supervised semantic segmentation is the only task where some kind of localization is involved.
Their comparision is with VGG-GAP which is a method to do classification and resultant localization. These are attention mechanisms so I assume that they will do classification and localization...
@raghavsi can you please specify what are they actually doing in table 2? is it image localization or recognition. From the description in the paper, it seems it is image recognition only. Can you provide us datasets on intranet say hpc? as we have download limits.
Yes, even in the cited papers the tables from which the numbers have been taken are for recognition only.
Lets do recognition. I have asked TAs to download data and make it available.
Thank you Sir!
Lets do recognition. I have asked TAs to download data and make it available.
Has the dataset been made available?
Lets do recognition. I have asked TAs to download data and make it available.
Has the dataset been made available?
Where are the datasets?
What exactly is the task of 'fine-grained recognition' in Table 2 of the paper (project 2)? Is it object localization/detection? In the CUB birds dataset, they also have lot of attributes, and locations of parts like beaks, feet etc per image - should we consider these also? Or only object detection/localization?