Closed jiqiujia closed 6 years ago
@jiqiujia
Thank you for the question!
We only did thresholding on MS-COCO dataset, following "CNN-RNN: A Unified Framework for Multi-label Image Classification, CVPR 2016"
Please refer to paragraph 2 of section 4.3 of the above paper: "Since the number of the objects per image varies considerably in this dataset, we do not set the minimum length of the prediction path during beam search"
Thanks for your quick reply. I understand now.
Besides, I would like to say this is really an awesome design. Could you share me with some experience of the designation of neural network architecture?
@jiqiujia
Thank you for your comments! For this work, the crucial parts are "gating operation (element-wise multiplication and sum pooling)" and "spatial softmax". These are ideas borrowed from widely used attention mechanism with some adaption to the multi-label problem. The left parts are a lot of experiments.
Thanks! Your answer help me to have a deeper understanding of this work!
From your paper, it seems that your do thresholding before calculating top3 precision and recall. However, the compared methods didn't seem to do that. I doubt if it is acceptable to do thresholding before calculating top3 precision and recall. Please clarify this, thanks.