melodyguan / enas

TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"
https://arxiv.org/abs/1802.03268
Apache License 2.0
1.58k stars 390 forks source link

Question on the Deriving Architectures section of section 2.2 of the paper #71

Open philtomson opened 5 years ago

philtomson commented 5 years ago

It says: "We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only the model with the highest reward to re-train from scratch. "

Is this something that happens at the end of every epoch or at the end of the final epoch? (do you wait until the final epoch to pick the sampled model with the highest reward and then re-train it from scratch?)

e-271 commented 5 years ago

I'm not sure if this answers your question, but it sounds to me like they train the controller first, and then sample a few architectures from the trained controller to find the best architecture. They seem to discard all the architectures produced during training, so I believe that means they're waiting until after the final epoch (in terms of controller training) to pick the architecture. To get an architecture after training, they sample and then run a single minibatch thru all the sampled ones, and only fully train the best one.

Training:

The training procedure of ENAS consists of two interleaving phases. The first phase trains ω, the shared parameters of the child models, on a whole pass through the training data set ... The second phase trains θ, the parameters of the controller LSTM, for a fixed number of steps, typically set to 2000 in our experiments. These two phases are alternated during the training of ENAS.

After training:

We discuss how to derive novel architectures from a trained ENAS model. We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only the model with the highest reward to re-train from scratch.