Closed Haoqing-Wang closed 3 years ago
Thanks for checking out our work @Haoqing-Wang!
I find an interesting phenomenon that the accuracy of ProtoNet's linear-layer form (w=2c, b=-c^2) is significantly lower than that of ProtoNet. However, theoretically the accuracy of the two should be the same. Do you have such a discovery? what is the reason?
Yes, when not finetuned, original ProtoNet and the linear-layer form are equal in the predictions that they make. However, in our approach (ProtoTransfer) we do not only initialize the final linear layer with a prototypical form (= ProtoNet), but we also finetune it and (optionally) the deep backbone (= ProtoTune).
In our paper (arXiv, PDF), table 2 (UMTRA-ProtoNet vs UMTRA-ProtoTune) and table 3 (ProtoCLR + ProtoNet vs ProtoTune) show that this strategy is particularly beneficial when there is a relatively high number of shots per class (>5) available. Note that the same tables also show that in a very few shot regime, due to finetuning, performance may slightly degrade. This is a trade-off we made when designing our approach.
Hope it helps!
Closing this soon, if no further comment is required.
Good job! I find an interesting phenomenon that the accuracy of ProtoNet's linear-layer form (w=2c, b=-c^2) is significantly lower than that of ProtoNet. However, theoretically the accuracy of the two should be the same. Do you have such a discovery? what is the reason?