quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

In your paper, why is the inner optimization of the loss function expensive? #139

Open Jugaris89 opened 4 years ago

Jugaris89 commented 4 years ago

Hi,

great work!! I have a short question regarding a couple of sentences in the paper. When it is commented: "The inner optimization argmin Ltrain(w,alpha) can be expensive" "The idea is to approximate by adapting using only a single training step, without solving the inner optimization completely by training until convergence"

Why is single training step needed? do you have some estimation how costly it'd be to solve the inner optimization training until convergence that motivates the appoximation proposed in the paper?

Thanks in advance and congratulations again for this work!

meghbhalerao commented 4 years ago

I think this is because, for every single update of the architecture alpha, you have to train the model till convergence, and then do one single architecture update. Instead, to reduce the complexity, you can just do it linearly i.e. one weight update, one arch update, one weight update, one arch update and so on till convergence. This will greatly reduce the time needed. Think of this something like stochastic gradient descent where you do not go in the direction of the greatest gradient always, but when you train for a sufficient amount of time, the end result is that you reach the minima.