Question in pre-train phase: Why use meta-val set to select pre-train model?

Errorfinder commented 4 years ago

Hello, thanks for your very enlightening work. However i found a spot that quiet confusing to me in your code, i describe it as follows:

It's clear said in paper, that in pre-train phase you want to train a feature extractor which has backbone of resnet on the training set of 64 classes in case of Mini-ImageNet. and i think that is basically a standard training procedure that we sample some of training set to train on and use the rest to perform validation, where here both training and val set are drawn from the training set of 64 classes in this case, as you said in paper for each class 550 for train and the rest 50 for validation.

However, in your code PreTrainer class, the validation part is done by 'n-way-k-class' tasks. This makes me very confusing, since i can't find any description to this part in the paper. And in my opinion, the pre-train phase is actually independent with later meta-training part. so what is the reason to do so? or i've missed some important info ? pls let me know, and thanks in advance!

yaoyao-liu commented 4 years ago

Hi @Errorfinder,

Thanks for your interest in our work. In pre-train phase, we aim to find the best model to initialize the backbone model for meta-train phase. The best model intuitively means the model with the highest meta-validation accuracy (we cannot access the meta-test set during pre-train phase, so we use meta-validation set). In this way, we don't need to split the 64-class images to train and validation sets. We can use all the 64-classes images for pre-training.

Besides, please note that the PyTorch version is not the code we used in the published paper. If you hope to reproduce the results for the paper, you may use the TensorFlow version.

Errorfinder commented 4 years ago

@yaoyao-liu Oh i see, that makes sense then. Thanks for your quick reply.

And i am really interested in this work, but i am not familiar with tensorflow. Since you mentioned that you use the results generated by tensorflow. Could you pls point out what are the majoy differences between both? I mean, are they just same content but built on different platform?

yaoyao-liu commented 4 years ago

Hi @Errorfinder,

The PyTorch version is built on FEAT, for some details (e.g. backbone, learning rate, optimizer, dataloader) we directly follow FEAT.

If you have any further questions, feel free to add more comments.