Fine-grained recognition?

vineetm / ell-881-2018-deep-learning

Course Materials for ELL 881 2018: Fundamentals of Deep Learning

9 stars 7 forks source link

Fine-grained recognition? #42

Open abhudev opened 6 years ago

abhudev commented 6 years ago

What exactly is the task of 'fine-grained recognition' in Table 2 of the paper (project 2)? Is it object localization/detection? In the CUB birds dataset, they also have lot of attributes, and locations of parts like beaks, feet etc per image - should we consider these also? Or only object detection/localization?

raghavsi commented 6 years ago

http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf has more details. please follow their methodology -- its object detction and localization for CUB in this case..

abhudev commented 6 years ago

Thank you sir! Is the task the same in SVHN dataset? Because here there are multiple digits that need to be localized in each image, which is different from CUB, where there is only one object to be localized. Since both CUB and SVHN are in the same table, I thought the two tasks should be the same. Also, how do we combine the attention output of conv4 and conv5? The paper describes 2-3 ways - concatening the outputs, adding them etc.

abhudev commented 6 years ago

Dear Sir, I have read the Section A.1 "Datasets", in which the authors say that "For CUB-200-2011, the images are cropped using ground-truth bounding box annotations and resized". So this means that they are in fact doing image recognition, and not object detection/localization. It is still not clear in the paper what they do for SVHN, but since SVHN also has a part in which there are cropped images of digits, it means that there also they are doing image recognition, not object localization/detection. So I think weakly supervised semantic segmentation is the only task where some kind of localization is involved.

raghavsi commented 6 years ago

Their comparision is with VGG-GAP which is a method to do classification and resultant localization. These are attention mechanisms so I assume that they will do classification and localization...

anshumitts commented 6 years ago

@raghavsi can you please specify what are they actually doing in table 2? is it image localization or recognition. From the description in the paper, it seems it is image recognition only. Can you provide us datasets on intranet say hpc? as we have download limits.

abhudev commented 6 years ago

Yes, even in the cited papers the tables from which the numbers have been taken are for recognition only.

raghavsi commented 6 years ago

Lets do recognition. I have asked TAs to download data and make it available.

abhudev commented 6 years ago

Thank you Sir!

mehak126 commented 6 years ago

Lets do recognition. I have asked TAs to download data and make it available.

Has the dataset been made available?

ankursharma-iitd commented 6 years ago

Lets do recognition. I have asked TAs to download data and make it available.

Has the dataset been made available?

Where are the datasets?