About hyper-parameter - Githubissues

openai / supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"

https://arxiv.org/abs/1803.02999

MIT License

989 stars 210 forks source link

About hyper-parameter #10

Closed csyanbin closed 6 years ago

csyanbin commented 6 years ago

Hi, Nice paper and thanks for sharing the code.

I noticed that the command given in this repo is different from Appendix A in the paper.
In the paper, hyper-parameters between experiments are shared and set to same in table 3 and 4.

Is there much performance difference between different hyper-parameters? 
What should I do if I want to run your algorithm in a new dataset?

Thanks so much!

unixpickle commented 6 years ago

Ah, thanks for reminding me to fix this. At one point, I updated the HPs (and results) in the paper to be simpler, because I found that they didn't make much of a difference. I never changed them in this repo.

csyanbin commented 6 years ago

Thanks for the reply.

Very insightful paper for saving GPU memory and computation cost. Hope to see more results on large network structures such as ResNet!

jaegerstar commented 6 years ago

@csyanbin I doubt it would work better in larger network because it brings more weights as it could increase the risk of overfitting. That why the former work MAML reduced the number of convolutional filters .

bkj commented 6 years ago

So -- just to make sure I'm translating parameters from the paper to code correct, the parameters for the omniglot experiments would look like this?

# 1-shot
python -u run_omniglot.py \
    --train-shots 10 \
    --inner-batch 10 \
    --inner-iters 5 \
    --shots 1 \
    --eval-batch 5 \
    --eval-iters 50 \
    --meta-batch 5 \
    --meta-iters 100000 \
    --learning-rate 0.001 \
    --meta-step 1 \
    --meta-step-final 0    

# 5-shot
python -u run_omniglot.py \
    --train-shots 10 \
    --inner-batch 20 \
    --inner-iters 10 \
    --shots 5 \
    --eval-batch 10 \
    --eval-iters 50 \
    --meta-batch 5 \
    --meta-iters 200000 \
    --learning-rate 0.0005 \
    --meta-step 1 \
    --meta-step-final 0

csyanbin commented 6 years ago

Hi @jaegerstar, thanks for the discussion.

In a paper named "Deep Meta-Learning: Learning to Learn in the Concept Space", they used maml-style update and applied ResNet as the CNN structure. And they gained improvements compared with original MAML paper.

The difference is that they only meta-update the last few layers for ResNet.

jaegerstar commented 6 years ago

@csyanbin thanks, I will take a look.

unixpickle commented 6 years ago

@bkj I see one problem--your 5-shot arguments should have --shots 5. Also, you may want to pass --transduction depending on which experiment you want to reproduce.

bkj commented 6 years ago

Ah yeah -- good catch. Updated above.