Question on the Deriving Architectures section of section 2.2 of the paper

I'm not sure if this answers your question, but it sounds to me like they train the controller first, and then sample a few architectures from the trained controller to find the best architecture. They seem to discard all the architectures produced during training, so I believe that means they're waiting until after the final epoch (in terms of controller training) to pick the architecture. To get an architecture after training, they sample and then run a single minibatch thru all the sampled ones, and only fully train the best one.

Training:

The training procedure of ENAS consists of two interleaving phases. The first phase trains ω, the shared parameters of the child models, on a whole pass through the training data set ... The second phase trains θ, the parameters of the controller LSTM, for a fixed number of steps, typically set to 2000 in our experiments. These two phases are alternated during the training of ENAS.

After training:

We discuss how to derive novel architectures from a trained ENAS model. We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only the model with the highest reward to re-train from scratch.

melodyguan / enas

Question on the Deriving Architectures section of section 2.2 of the paper #71