NAS algorithms API Supports for dynamic CNN

superkevingit commented 4 years ago

What would you like to be added:

NAS algorithms API supports for dynamic CNN like torch.nn.functional.conv2d(), which the filter's weights are generated by other network dynamically.

Why is this needed:

Be compatible with PyTorch APIs.

Without this feature, how does current nni work：

Currently, nni implemented APIs for static network definition.

Components that may involve changes:

I think, some new APIs that can return the final choice before actually define the conv should be added.

Brief description of your proposal if any:

I am wondering if I can use the static API to define the search space, and I will get the current choices of network structures, then I define dynamic convs based on the NAS choice, and calculate the gradients through the dynamic convs in forward function. Here, the search space defined in __init__ function is just for getting the structure references.

superkevingit commented 4 years ago

Part of my confusion has been solved. The figure above is our settings. We have two network, top one we called controller, which is used to generate params for dynamic convolution, bottom one is the network we want NAS to find the best structure. We have three steps( as shown in the figure). First, we get the candidate conv structure from NAS decision. Second, based on the structure in this trial, bottom network tell the controller network the number of params that it should generate, and decide the channel number. Final, controller generate conv params for the other network.

superkevingit commented 4 years ago

There are two problems in my implementation in PyTorch before. First one, how can I implement dynamic conv with API such as LayerChoice ? So I rewrite the demo function for Dilated depthwise separable conv like this:

class DilConv(nn.Module):
    """
    (Dilated) depthwise separable conv.
    ReLU - (Dilated) depthwise separable - Pointwise - BN.
    If dilation == 2, 3x3 conv => 5x5 receptive field, 5x5 conv => 9x9 receptive field.
    """

    def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True, bias=False):
        super().__init__()
        self.c_in = C_in
        self.c_out = C_out
        self.kernel_size = kernel_size
        self.bias = bias
        self.stride = stride
        self.padding = padding
        self.dilation = dilation

        self.activate = nn.ReLU()
        self.bn = nn.BatchNorm2d(C_out, affine=affine)

    def forward(self, x, params):
        dw_weight_params, dw_bias_params, params = self.reshape_params(params, self.c_in, self.c_in, self.kernel_size, self.bias)
        pw_weight_params, pw_bias_params, _ = self.reshape_params(params, self.c_in, self.c_out, 1, self.bias)
        x = self.activate(x)
        x = F.conv2d(x,
                     weight=dw_weight_params,
                     bias=dw_bias_params,
                     stride=self.stride,
                     padding=self.padding,
                     dilation=self.dilation,
                     groups=self.c_in)
        x = F.conv2d(x,
                     weight=pw_weight_params,
                     bias=pw_bias_params)

        return self.bn(x)

    def reshape_params(self, params, in_channel, out_channel, filter_sz, bias):
        # TODO
        return params, None, params

superkevingit commented 4 years ago

The second one is how to tell the controller the NAS decision for dynamic conv in an elegant way, and I am working on it.

ultmaster commented 4 years ago

Hi. Thanks for asking. So can you show the implementation of your "controller" briefly? I'm not sure I have understood the idea shown in the picture. Actually, if you have read the implementation of "EnasMutator", you will see that controller is the one who made the decision and it doesn't needs to be "told" of the NAS decision.

superkevingit commented 4 years ago

I'm sorry I didn't make myself clear, maybe the name of the network conflicts here. "Controller" is an auxiliary network for the bottom one, in computer vision task for example, the bottom one uses image-level features and the "controller" use the temporal information, and we want to use dynamic conv to combine them.

QuanluZhang commented 4 years ago

@superkevingit thanks for reporting your issue. According to my understanding, there are three components: NAS algorithm (for generating kernel size, conv type), controller network (for generating conv weights for classification network), and classification network (receives the hyper-parameters from NAS algorithm and receives conv weights weights).

I have several questions:

what controller network receives from classification network?
what NAS algorithm receives from classification network (or controller network)?
is it possible to merge NAS algorithm and controller network together?

superkevingit commented 4 years ago

Yes, you are right!

Controller actually receives the hyper-parameters from NAS algorithm to define the final channel number, but we defined the NAS API in the classification network like LayerChoice, so the controller network should get the feedback of NAS algorithm from classification network in the implementation.
As the classification network is designed for the main task, I think NAS algorithm receives metrics like gradients, loss(depending on the NAS algorithms we use) to make the next search sample. And I think controller network contributes to the NAS algorithm decision in an indirect way, because it is the input of classification network.
I don't think that's a good idea. For example, if the search space of classification network is a convolutional filter with 3x3, 5x5 filter size, with no bias and the input channel is 1, filter number is 1 . Then, the output dimension of controller's last layer can be 9 or 25(1x3x3x1, 1x5x5x1). It can be hard to pre-define all the permutations when search space is large. So, if we use the NAS algorithm in the controller, we are actually dealing with different channel numbers but different convolutional structures. But yes, we can implement the code that the controller return convolution filters instead of their weights, it can be a way to solve the problem.

ultmaster commented 4 years ago

I think there are several components/logics in your design and another several components in NNI. I propose a mapping here.

Meta-network/hypernet/supernet weight that is used to generate the weight actually used. -> should be stored in a class like LayerChoice (e.g., DynamicConvLayerChoice)
Decide which kernel size, filter size, conv type should be used. -> Should be implemented in mutator (e.g., on_dynamic_conv_layer_choice_forward, or in reset).
How the weight above is used to generate the weight used. -> Should be implemented in mutator (e.g., on_dynamic_conv_layer_choice_forward).
Receive the reward and update decision model. -> Should be implemented in mutator.

By design, mutator is just an "implementation" of the underlying computational logic of your layer choice. Related parameters (hypernet parameters) should be stored in layer choie itself. In another word:

layer choice: storing supernet-parameters and exposing search space definition
mutator: what your controller and NAS algorithm is doing here.

You can surely implement NAS algorith and controller as separate components that belong to mutator, but it's your own choice.

kvartet commented 3 years ago

@superkevingit I’m closing this issue as it has no updates from user for 3 months, please feel free to reopen if you are still seeing it an active issue.

superkevingit commented 3 years ago

@superkevingit I’m closing this issue as it has no updates from user for 3 months, please feel free to reopen if you are still seeing it an active issue.

Okay, Thanks!

microsoft / nni

NAS algorithms API Supports for dynamic CNN #2693