melodyguan / enas

TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"
https://arxiv.org/abs/1802.03268
Apache License 2.0
1.58k stars 390 forks source link

Questions about your paper #23

Open neighthan opened 6 years ago

neighthan commented 6 years ago

These aren't issues with the code here as much as questions I had after reading your paper; feel free to close this issue if you don't want to discuss such things here.

  1. In section 2.1 you say

    for each pair of nodes $j < \ell$, there is an independent parameter matrix $W^{(h)}_{\ell, j}$.

    but then in section 2.3 it says

    As for recurrent cells, each operation at each layer in our ENAS convolutional network has a distinct set of parameters.

    (emphasis added). Just to be clear - from the RNN section, it seemed that there was only one weight matrix per pair of nodes, but there's actually one per activation function per pair of nodes?

  2. In the RNNs, is node 1 the only node that can access $xt, h{t - 1}$? Or could later nodes output a certain index that corresponds to either of this? (this doesn't happen in any of the shown examples, but I wanted to make sure that this isn't possible)

    • EDIT - just looked through the appendix which affirms that only the first node in an RNN cell has access to $xt, h{t - 1}$.
  3. In the CNN section, from Figure 3, node 4 (the conv 5x5 layer) should take as input the outputs of nodes 1 and 3. However, it seems to take the concatenation of nodes 1, 2, and 3. Am I misunderstanding how selecting the nodes works? For ease, here's the relevant figure:

  4. A clarification about the training procedure to see if I understand correctly: you sample just one model from the controller (e.g. one RNN), which you then train on one pass through the training data, and finally you do some number (e.g. 2000) of update steps on the controller using REINFORCE? You're just updating based on the performance of the single model trained, so wouldn't each of these steps be the same (so taking 2000 of them is like taking one step that's 2000 times as large)? Am I missing something about the controller update part?

If you could help clear some confusion about any or all of these points, I'd appreciate it! Overall, certainly a good paper, and thanks for providing the code as well!

HanDarkholme commented 6 years ago

As for the 3rd problem, node 4 is right, for that it refers to the model with the red lines in the figure 3, so the black line that points 2 to 4 is not included.