microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.88k stars 1.81k forks source link

How ro calculate the model size and flops in NAS project? #1947

Closed marsggbo closed 3 years ago

marsggbo commented 4 years ago

I want to use NAS method to search a resource-aware model, but for each step, the mutator generates a new model structure so that it is not easy to calculate the generated model size, especially flops. Is any possible and easy way to implement this? Thanks!

ultmaster commented 4 years ago

Hi @marsggbo. Thanks for raising this issue.

For parameter size, because NNI builds a supernet for training, you may experience issues if you directly use model.parameters() which will count every parameter even if some of them is not used. This issue will be addressed later in NNI framework. For now, a simple workaround is to dry run and do back-propagation and count the parameters used. e.g.,

def param_size(model, loss_fn, input_size):
    """
    Compute parameter size in MB
    """
    x = torch.rand([2] + input_size).cuda()
    y = model(x)
    target = torch.randint(model.n_classes, size=[2]).cuda()
    loss = loss_fn(y, target)
    loss.backward()
    n_params = sum(np.prod(v.size()) for v in model.parameters() if v.grad is not None)
    return n_params / 1e6

As for FLOPS, calculating FLOPS doesn't have native support in PyTorch. You need to consort to tools like torchscope.

marsggbo commented 4 years ago

@ultmaster Thanks for your reply. But there is still an issue that we can only get the real size of the real model after loss.backward(). But in Enas, I want to add the model size as a loss item which is conflicted with your example. Is there any other solutions? Thanks!

ultmaster commented 4 years ago

Looks to me that you want to trace the number of parameters used for every sub architecture sampled during training phase. In fact, the model size is not differentiable. So how do you want to add it as a loss term? If all you want is to display the loss as a number without differentiation, you can add it after loss.backward().

marsggbo commented 4 years ago

Looks to me that you want to trace the number of parameters used for every sub architecture sampled during training phase. In fact, the model size is not differentiable. So how do you want to add it as a loss term? If all you want is to display the loss as a number without differentiation, you can add it after loss.backward().

The reason why I add the model size as an loss item is that I want the mutator to find a smaller model which also has a comparable performance. Since the model size is not differentiable, is there any other substitutable method? Thanks!

ultmaster commented 4 years ago

For sure you can add the flops or param size as part of the reward function, so that controller is aware of the model size, that's exactly what MnasNet, FBNet, and SPOS did. What I want to add here is, other than loss.backward(), there is another method called lookup table method, which we provide an example, SPOS, that asks the model to give flops from a lookup table, when all choices are provided. I'm hoping it can be helpful. Here is an example: https://github.com/microsoft/nni/blob/598d8de2a9ac9579dea962d22d428b14557da38e/examples/nas/spos/network.py#L106

marsggbo commented 4 years ago

Thanks for your reply! But I have a question is that the flops obtained by using the lookup-table method is also not differentiable. Or the loss of enas controller is not necessary to be differentiable?

ultmaster commented 4 years ago

Reward for RL agent is not necessarily differentiable. There are certain tricks that can make flops differentiable, like FBNet. Please read the papers for more details.

marsggbo commented 4 years ago

I implement a function to calculate the real model size by only using model and mutator, i.e. we don't have to call loss.backward(). The function code is as follows:

import nni

def calc_real_model_size(model, mutator):
    '''calculate the size of real model
        real_size = size_choice + size_non_choice
    '''
    if isinstance(model, torch.nn.DataParallel):
        model = model.module
    else:
        model = model
    size_choice = 0 # the size of LayerChoice
    size_non_choice = 0 # the size of normal model part

    # the size of normal model part
    for name, param in model.named_parameters():
        if 'choice' not in name:
            size_non_choice += param.numel()

    # the mask for each operation, the masks looks like as follows:
    # {'down_node_0_x_op': tensor([False,  True, False, False, False, False]),
    #  'down_node_0_y_op': tensor([ True, False, False, False, False, False]),
    #  'down_node_1_x_op': tensor([False,  True, False, False, False, False]),
    #  'down_node_1_y_op': tensor([False, False, False, False, False,  True]),
    #  ...
    #  'up_node_0_x_op': tensor([False, False, False, False,  True, False]),
    #  'up_node_0_y_op': tensor([False, False,  True, False, False, False]),
    #  'up_node_1_x_op': tensor([False,  True, False, False, False, False]),
    #  'up_node_1_y_op': tensor([False, False, False,  True, False, False]),
    #  ...
    masks = {}
    for key in mutator._cache:
        if 'op' in key:
            masks[key] = mutator._cache[key]

    # the real size of all LayerChoice
    for name, module in model.named_modules():
        if isinstance(module, nni.nas.pytorch.mutables.LayerChoice):
            size_ops = []
            for index, op in enumerate(module.choices):
                size_ops.append(sum([p.numel() for p in op.parameters()]))

            # parse the key for masks, which is needed to modified for different model.
            infos = name.split('.')
            layer_type = infos[0] # down or up
            node_id = infos[4] # 0-4
            cell_id = infos[5][-1] # x or y
            prefix = 'down' if 'down' in layer_type else 'up'
            key = f"{prefix}_node_{node_id}_{cell_id}_op"
            index = masks[key].int().argmax()
            size_choice += size_ops[index]
    real_size = size_choice + size_non_choice
    return real_size
ultmaster commented 4 years ago

I think that's true, that's basically what we do in SPOS. We might want to expose cache as public API for convenience later.

Also, these two lines can be more elegant:

for name, module in model.named_modules():
    if isinstance(module, nni.nas.pytorch.mutables.LayerChoice):

->

for mutable in mutator.mutables:
    if isinstance(mutable, LayerChoice):
CuriousCat-7 commented 4 years ago

while a straightforward way to implement it is to define a "sample" function in your model, like:

...
     def forward(self, x):
         return self.classifier(self.tail(self.mid(self.stem(x))).flatten(1))

     def sample(self, mutator, no_els=False):
         mid = nn.Sequential([
             for i in range(len(mid))
         ])
         mid = []
         for i in range(len(mid)):
             m = mid[i][mutator._cache[mid[i].key]]
             if no_els and isinstance(m, ConvELS):
                 continue
             mid.append(m)
         mid = nn.Sequential(*mid)

         class S(nn.Module):
             def __init__(selff):
                 super(S, selff).__init__()
                 selff.stem = self.stem
                 selff.tail = self.tail
                 selff.mid = mid
                 selff.classifier = self.classifier
             def forward(selff, x):
                 return selff.classifier(selff.tail(selff.mid(selff.stem(x))).flatten(1))
         return S()

The good thing is that you can choose the specific sample method. In my case, I need to change "ConvELS" module to "Identity" in some training phrase

kvartet commented 3 years ago

@marsggbo @CuriousCat-7 I‘m closing this issue as it has no updates from user for 3 months, please feel free to reopen if you are still seeing it an active issue.