a question about meta-training strategy

yaoyao-liu / meta-transfer-learning

TensorFlow and PyTorch implementation of "Meta-Transfer Learning for Few-Shot Learning" (CVPR2019)

https://lyy.mpi-inf.mpg.de/mtl/

MIT License

750 stars 149 forks source link

a question about meta-training strategy #45

Open Sword-keeper opened 3 years ago

Sword-keeper commented 3 years ago

Hi, when i read your code. i noticed that your meta-training strategy have some differences with MAML. Could you tell me which meta-learning paper design this strategy ? Or it is your design? Besides, what's the reason you choose this strategy?

yaoyao-liu commented 3 years ago

Hi,

What do you mean by “training strategy”? Do you mean that we introduce “pre-training” phase?

Best, Yaoyao

Sword-keeper commented 3 years ago

I mean meta training phase. In maml's outer loop, the loss which update model's params is all tasks'(100 training task) loss' sum. In each outer loop epoch,model's param update only once . however ,in your torch version, in the outer loop phase, the loss which update model's params is every task's loss. in each outer loop epoch, it update 100 times(training task num). This pic may can explain more clearly.

yaoyao-liu commented 3 years ago

I think you misunderstand MAML.

MAML doesn't use all tasks' losses to update the model in the outer loop. Our MTL uses a similar meta-training strategy as MAML. Your figure doesn't show the correct strategy applied in MAML.

In MAML, they use the "meta-batch" strategy, i.e., using the average loss of 4 tasks to update one outer loop iteration. In our method, we just set the number of "meta-batch" to 1.

Sword-keeper commented 3 years ago

oh i see. thank you every much. And could you tell me why you set the meta-batch to 1? what's the meaning of meta-batch?

yaoyao-liu commented 3 years ago

If the meta-batch size is 4, in one outer loop iteration, the model will be updated by the average loss of 4 different tasks. I set the meta-batch size to 1 because it will be easier to implement it...

Sword-keeper commented 3 years ago

well... thank you~

yaoyao-liu commented 3 years ago

No problem.

yaoyao-liu commented 3 years ago

I think your figure is correct. But n is not 100. It is 4 in the different settings of MAML.

Besides, n is not the number of all tasks. In MAML, we can sample e.g., 10000 tasks. The four tasks in one meta-batch are sampled from the 10000 tasks.

Sword-keeper commented 3 years ago

oh you are right . i misunderstand this figure.

LavieLuo commented 3 years ago

@Sword-keeper Hello, I agree with u. And also thank the authors for their useful reply. I guess the main difference between the MTL and MAML w.r.t. “training strategy” is the setting of meta_batch_size, where MAML is 4 and MTL is 1. Besides, I guess "update 100 times" means the parameter update_batch_size ($k$ in your figure) in MAML code, which is set as 5 while MTL is 100? I'm actually also puzzled about this. (e.g., line 101 in meta-transfer-learning/pytorch/trainer/pre.py ) for _ in range(1, self.update_step):

yaoyao-liu commented 3 years ago

Hi @LavieLuo,

Thanks for your interest in our work. In MAML, they update all network parameters during base-learning 5 times. In our MTL, we update the FC layer during base-learning 100 times. As we update a minimal number of parameters compared to MAML, we can update them more times.

If you have any further questions, please send me an email or add comments on this issue.

Best, Yaoyao

LavieLuo commented 3 years ago

@yaoyao-liu Woo, thank you for this prompt reply. Now I completely understand the motivation of this strategy. That's cool! :)

yaoyao-liu commented 3 years ago

@LavieLuo In my experience, if the base-learner overfits the training samples of the target task, the performance won't drop. So I just update the FC layer as many times as I can to make it overfitting.

LavieLuo commented 3 years ago

@yaoyao-liu Yes, I agree! I remember some recent works show the overfitting of DNNs manifests in probabilistic (over-confidence) which somehow doesn‘t degrade the accuracy. Also, I forget that MTL only trains a part of the parameters, and now I figure it out. Thanks again!

yaoyao-liu commented 3 years ago

@LavieLuo No problem.