In your paper, why is the inner optimization of the loss function expensive?

quark0 / darts

Differentiable architecture search for convolutional and recurrent networks

Apache License 2.0

3.92k stars 843 forks source link

Hi,

great work!! I have a short question regarding a couple of sentences in the paper. When it is commented: "The inner optimization argmin Ltrain(w,alpha) can be expensive" "The idea is to approximate by adapting using only a single training step, without solving the inner optimization completely by training until convergence"

Why is single training step needed? do you have some estimation how costly it'd be to solve the inner optimization training until convergence that motivates the appoximation proposed in the paper?

Thanks in advance and congratulations again for this work!

quark0 / darts

In your paper, why is the inner optimization of the loss function expensive? #139