multi-gpu training & maml baseline

yaoyao-liu / meta-transfer-learning

TensorFlow and PyTorch implementation of "Meta-Transfer Learning for Few-Shot Learning" (CVPR2019)

https://lyy.mpi-inf.mpg.de/mtl/

MIT License

731 stars 147 forks source link

multi-gpu training & maml baseline #42

Open DanqingZ opened 3 years ago

DanqingZ commented 3 years ago

Hi thank you so much for the codebase! I am looking for a multi-gpu PyTorch maml implementations. I am wondering if I can use your codebase for this.

For the multi-gpu training, can I simply use DataParallel to parallel the model? Will the existing data loader work with the the DataParallel model?

self.model = torch.nn.DataParallel(self.model)

Also, I am wondering if I skip the pre-train step and run meta learning directly (made some changes not to load the pre-trained model), is that MAML? Many thanks and look forward to your reply!

yaoyao-liu commented 3 years ago

Hi Danqing,

Thanks for your interest in our project. For (1): I have never tried to run this project on multiple GPUs. You may try that. Welcome to report your results here. For (2): It is different from the original MAML. In our method, during base-learning, we only update the FC classifier weights. During meta-learning, we update the scaling and shifting weights. In MAML, they update all the network parameters during both base-learning and meta-learning.

If you have any further questions, feel free to leave additional comments.

Best, Yaoyao

DanqingZ commented 3 years ago

Hi Yaoyao, thanks for the reply! I see, I can report the numbers later here when I finish the experiments.

For (2), so what you mentioned is the FT and SS meta-training operations in your paper. I actually have one question on the table 2 of your paper. For the line "MAML deep, HT", did you combine the pre step with the MAML algorithm? Do you have the experiment: "MAML deep, HT" without the fine-tuning? Then we can see how performance improvement fine-tuning contributes. The differences between your proposed MTL algorithm, and the MAML-Resnet algorithm are: 1) fine-tuning; 2) HT and 3) FT->SS meta-training operations. I am actually curious how much performance improvement each component contributes. Thanks!

yaoyao-liu commented 3 years ago

Hi Danqing,

For "MAML deep, HT" in Table 2, we used the pre-trained model (ResNet-12 (pre)). For different ablative fine-tuning settings, you may see the results in Table 1. As the model is pre-trained on 64 classes (miniImageNet), we are not able to directly apply it to 5-class tasks without any fine-tuning steps. At least, we need to fine-tune the FC classifiers.

Best, Yaoyao

DanqingZ commented 3 years ago

@yaoyao-liu , then for "SS[Θ;θ], HT meta-batch" in table 2, is that also the pre-trained model without the first fine-tuning step? I mean which experiments in table 2 has the "(a) large-scale DNN training" step?

DanqingZ commented 3 years ago

The differences between your proposed MTL algorithm, and the MAML-Resnet algorithm are: 1) fine-tuning; 2) HT and 3) FT->SS meta-training operations. If we want to claim "SS meta-training operations" works, then we need to make sure the comparison experiments also have 1) fine-tuning and 2) HT. I am trying to understand your work better, please correct me if I am wrong. Thanks.

yaoyao-liu commented 3 years ago

I am not sure what do you mean by "first fine-tuning" step.

In Table 2, if the feature extractor is labeled with "(pre)" (e.g., ResNet-12 (pre)), then the pre-trained model is applied. The model is pre-trained on all base class samples.

The results in Table 1 show that the "SS meta-training operation" works. Comparing the 3rd block with the 1st and the 2nd blocks, you can observe that our "SS" performs better than "FT" and "update". "HT meta-batch" is not applied in Table 1.

DanqingZ commented 3 years ago

oh I see, I thought ResNet-12 (pre) means the ResNet-12 without any fine-tuning.

By 'first fine-tuning" step I mean "(a) large-scale DNN training" step.

DanqingZ commented 3 years ago

For table 1, did you first conduct the "(a) large-scale DNN training" step?

yaoyao-liu commented 3 years ago

For table 1, did you first conduct the "(a) large-scale DNN training" step?

Yes. In the caption, you can see "ResNet-12 (pre)" is applied.

DanqingZ commented 3 years ago

Yeah I understand by loading the pre-trained models, we have to drop the classifier parameters and only use the encoder parameters. This is like domain-finetuning steps, adapting the pre-trained model weights to the domain.

DanqingZ commented 3 years ago

For table 1, did you first conduct the "(a) large-scale DNN training" step?

Yes. In the caption, you can see "ResNet-12 (pre)" is applied.

I see, thanks for the clarification! I misunderstood "ResNet-12 (pre)".

yaoyao-liu commented 3 years ago

You're welcome.

DanqingZ commented 3 years ago

Hi @yaoyao-liu , I have an additional question, if we don't run the large-scale DNN training step, and just run the experiment with "SS[Θ;θ], HT meta-batch", will the performance be better than "MAML, HT meta-batch"?