Protonet re-implementation details

kjunelee / MetaOptNet

Meta-Learning with Differentiable Convex Optimization (CVPR 2019 Oral)

Apache License 2.0

529 stars 97 forks source link

Protonet re-implementation details #50

Closed ars22 closed 4 years ago

ars22 commented 4 years ago

Hi,

Thanks for the detailed documentation -- It was very helpful! I have a question regarding the re-implementation of Protonet with the Resnet-12 backbone. How many ways were used to train for both 5shot and 1shot and was the same used for all datasets? Also, was label smoothing applied for the Protonet experiments too (again was it done all datasets)?

Thanks, Amrith

kjunelee commented 4 years ago

(a) For prototypical networks, we set meta-training shot == meta-test shot for all datasets. In other words, if you want to test your model on 1-shot scenario, you need to set meta-training shot to be 1. (b) From what I remember, label smoothing was only applied for miniImageNet.

ars22 commented 4 years ago

Thanks for your response @kjunelee

I understand that the shots are matched, my question was more on the number of ways. The original protonet paper found that increasing ways improved their performance. So I was wondering if you do the same in your re-implementation with Resnet-12 or do you train only with 5-way. Also, have you tried experiments with increasing ways for the SVM and RR models?

kjunelee commented 4 years ago

Sorry I misread your question.

From what I remember, in our implementation increasing ways did not help for Prototypical Networks. Hence, we always trained on 5-way for experiments in our paper. Note that we use different hyperparameters (e.g. number of query samples) from original ProtoNet paper and this may be the reason why we did not observe a gain by increasing "way". To be honest, we didn't try much on increasing the number of ways as it increases memory consumption. It might be still useful to try it. One caveat is that due to the way we implemented multi-class SVM, the running time will blow up if you try >10-way. Our ridge regression implementation does not have such problem.