jakesnell / prototypical-networks

Code for the NeurIPS 2017 Paper "Prototypical Networks for Few-shot Learning"
MIT License
1.11k stars 252 forks source link

question of a detail #9

Open Interesting6 opened 5 years ago

Interesting6 commented 5 years ago

Thanks for your work, sir. I have a question about the following instruction:

Re-run in trainval mode python scripts/train/few_shot/run_trainval.py. This will save your model into results/trainval by default.

what does it mean? restart a training? or on the basis of the first running, then using its parameters train the model to find a better solution of parameters?

and why the results(embedding parameters) are equal when I run your code twice? due to the random seed? if not do that, can you guarantee the embedding results are equal?

schatty commented 5 years ago

Hi, as I'm reading this repo right now maybe I can help a bit. run_trainval.py is for continuing training from existing model, you can see --model.model_path parameter in run_trainval.py script, i.e. model with trained embedding will be loaded and will be optimised even more from this point. The equality of the results can be due to the torch random seed fixation in train.py

Interesting6 commented 5 years ago

Thank you Schatty, I understand that. meanwhile, some question raised. why need that rerun the model? and how that can make sure the model achieve a better performance? it's that because of the different training dataset? besides, if the random seed removed, the accuracy may hardly arrive this performance I think.

schatty commented 5 years ago

I think the logic in trainval.py can be used to train the model in several steps for example in the case of the lack of continuous time access to computation resources. It can also be used to perform additional training on some other dataset, but I didn't see mention on it so far. Why do you think removal of random seed will affect the accuracy so badly?

Interesting6 commented 5 years ago

sorry schatty, recently I'm busy so that I have no time to see it.

I'm quite confused to what do you mean "the lack of continuous time access to computation resources". Can you explain it with more details?

just now, I review the source code of loading data. I see that file split/vinyals/train.txt is using to load all training classes.

Angelic/character01/rot000 Angelic/character01/rot090

as above showed, for Angelic/character01, rotation 0 degrees and rotation 90 degrees are two different training classes? So I am coming here to make sure that it's using to increasing classes rather than data argumented(or says increasing number of samples)?

and last, I have a question about its training&test strategies. I see in many paper cite this paper and said, "the consistency between training and test environment alleviate the distribution gap and improves generalization". I think here the environment is the "n-way, k-shot" episode strategy. so why that can alleviate the distribution gap and improves generalization?

schatty commented 5 years ago

I just mean that it is often needed to continue to work with the model trained earlier if the full training procedure can not be performed at once. Yes, that are two different training classes, i.e. rotating used not for augmenting purpose

It is hard for me to answer the last one, but it seems to have the same environment of training and evaluation is always good to maintain distribution of some parameters, but I don't remember related statements about gaps and generalisation in the original paper