Open chenhao10807 opened 6 years ago
I know this issue is really old, but I guess some random guy reading this might be interested as well ...
For each image, the 5 best guesses for classification are considered. For each of those 5 guesses (referred to as top 1, top 2 ... etc.) bounding boxes are generated. The ILSVRC challenge allows 5 guesses, because there are sometimes multiple instances of one object in the image, and it is hard to label all instances correctly by hand (imagine, there are thousands of images. See the paper "ImageNet Large Scale Visual Recognition Challenge, Chapter 4: Evaluation at large scale" for more information). So you might not hit the "right" object with your first guess.
Well, you could try to run the CAM algorithm on an image with multiple classes/objects. But what now? Take the first guess for the first object, the guess with second highest confidence for the second object and so on, hoping for some good results? What about images without objects to localize/classify? Those cases do exist as well. The main problem is that you have no clue how many objects there are in the image. I think this algorithm is not meant to be used for detection.
You could try to use some kind of combination with selective search. Get region proposals by selective search, crop those regions from the image, run this algorithm, create bounding boxes and try to remap them on to the original image. Something like that might work...
Thank you for sharing the code.but can it be used for detection task?