Closed harewei closed 6 years ago
I've gone through the code quite a few times, so I guess I'll answer these myself, in case anyone sees this in the future.
Hi@harewei : could you please elaborate your above explainations, i have struggled understanding this paramter sh aring too . Thank you ~
@manhquang144 The weights of particular node operation in particular layer is shared. So for example, you have 2 node operation, Conv2D and Maxpool (it's 6 for the original ENAS implementation). Say you are trying to make a network of 2 layers.
In the first layer, you have layer1-Conv2D and layer1-Maxpool. In the second layer, you have layer2-Conv2D and layer2-Maxpool and so on (if you want a larger network.
The first child network you create uses, say, layer1-Conv2D and layer2-Conv2D. If the second child wants a Conv2D followed by Maxpool, it will call and share the weight of layer1-Conv2D used by the first layer, then call layer2-Maxpool (which isn't used by child 1, but if other networks also uses Maxpool at the 2nd layer, the weights will be the same as this).
@harewei : Thank you so much Harewei, not I got the idea on how it works
@harewei, how does parameter sharing work in skip connections for CNN macro search?
@harewei , Thank you for the detailed explanation above. I just have one more question. Suppose that we have 3 nodes, each having Conv2D and Maxpool ops. Further assume that the controller generates child model #1 which looks like this: Layer1-Conv2D => Layer2-Conv2D => Layer3-Maxpool. Now suppose that the controller generates child model #2 which looks like this: Layer1-Conv2D => Layer2-Conv2D + Layer1-Conv2D => Layer3-Maxpool (i.e. the first two layers are the same as in child model #1, but we connect both the first and second layer to the third layer using skip connections). In this scenario, child model #2 will share the weights for Layer1-Conv2D. But how about the weights for Layer3-Maxpool? I guess my dilemma is: Does weight sharing depend on skip connections?
@maurizio-zen MaxPool does not use weights, so in that case, it would not a problem.
@harewei, are all shared weights (even the ones that aren't being used) held in GPU memory while training the controller? If so, how much memory does that consume, for CIFAR-10? Also, is the number of channels constrained within each cell/layer? If not, there could be many different possible weight tensors for Layer1-Conv2D, for example?
Sorry that this isn't actually an issue with the code, but just a question which I'm unable to figure out by reading the paper and code.
I'm trying to understand how the parameter sharing works in ENAS. The first two questions are there partially to answer the third main question.