microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.05k stars 1.81k forks source link

[TypeError] caused by SlimPruner #4251

Open Veal98 opened 3 years ago

Veal98 commented 3 years ago

TypeError: trainer() got an unexpected keyword argument 'optimizer':

nni version= 2.4, I found that the document did not specify that these three parameters(optimizer, trainer, criterion) should be passed in. And when I passed in according to the following code, an error was reported

image

    def trainer():
        lr = 1e-3
        optimizer_ = torch.optim.Adam(model.parameters(), lr)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer_, step_size=1, gamma=0.92)
        for epoch in range(start_epoch, end_epoch):
            # 单次 epoch 训练 + 验证
            fit_one_epoch(model, yolo_loss, epoch, epoch_size, epoch_size_val, gen, gen_val, end_epoch, Cuda, optimizer, loss_history)
            lr_scheduler.step()

    pruner = SlimPruner(model, config_list,
                        optimizer=torch.optim.Adam(model.parameters(), lr=1e-3),
                        trainer=trainer,
                        criterion=YOLOLoss(np.reshape(anchors, [-1, 2]), num_classes, (input_shape[1], input_shape[0]), 0, Cuda, False))
J-shang commented 3 years ago

Hi @Veal98 , please refer to this doc https://nni.readthedocs.io/en/stable/Compression/Pruner.html#user-configuration-for-slim-pruner

The trainer should have four parameters, like this def trainer(model, optimizer, criterion, epoch):, and you can find an example here https://github.com/microsoft/nni/blob/cdb65dacf5dcafba4dc37ccdf8a86879f4c0e35c/examples/model_compress/pruning/basic_pruners_torch.py#L210

The trainer is expected to train one epoch.

Veal98 commented 3 years ago

Hi @Veal98 , please refer to this doc https://nni.readthedocs.io/en/stable/Compression/Pruner.html#user-configuration-for-slim-pruner

The trainer should have four parameters, like this def trainer(model, optimizer, criterion, epoch):, and you can find an example here

https://github.com/microsoft/nni/blob/cdb65dacf5dcafba4dc37ccdf8a86879f4c0e35c/examples/model_compress/pruning/basic_pruners_torch.py#L210

The trainer is expected to train one epoch.

Thanks, and i want to know what meas of this parameter : epoch ? it's mean the size of sparsity training, such as epoch = 10?

And, I also have an question: Why does the mAp drop of my Mobilenetv2-Yolov4 model after the 0.5 sparsity L1FilterPruner is very serious (mAp 78% -> mAp 45%), even though I have fine-tuned 50 epochs

J-shang commented 3 years ago

epoch is for telling the trainer the current epoch number, maybe the trainer will use this information, for normal case, you can ignore epoch.

Could you show us the code for reference if it is convenient, and do you mean during the fine-tuning, mAP is always around 45%?

Veal98 commented 3 years ago

My network structure is like this, MobileNet + Yolov4:


YoloBody(
  (backbone): MobileNetV2(
    (model): MobileNetV2(
      (features): Sequential(
        (0): ConvBNReLU(
          (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
              (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (2): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
              (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (3): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False)
              (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (4): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
              (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (5): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (6): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (7): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
              (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (8): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (9): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (10): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (11): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
              (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (12): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (13): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (14): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False)
              (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (15): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (16): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (17): InvertedResidual(
          (conv): Sequential(
            (0): ConvBNReLU(
              (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (1): ConvBNReLU(
              (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
              (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU6(inplace=True)
            )
            (2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (18): ConvBNReLU(
          (0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
      )
      (classifier): Sequential(
        (0): Dropout(p=0.2, inplace=False)
        (1): Linear(in_features=1280, out_features=1000, bias=True)
      )
    )
  )
  ======================== SPP 模块前的三次卷积(深度可分离卷积 * 3)========================
  (conv1): Sequential(
    (0): Sequential(
      (conv): Conv2d(320, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )

  ======================== SPP 模块 ========================
  (SPP): SpatialPyramidPooling(
    (maxpools): ModuleList(
      (0): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
      (1): MaxPool2d(kernel_size=9, stride=1, padding=4, dilation=1, ceil_mode=False)
      (2): MaxPool2d(kernel_size=13, stride=1, padding=6, dilation=1, ceil_mode=False)
    )
  )

  ======================== SPP 模块后的三次卷积(深度可分离卷积 * 3)========================
  (conv2): Sequential(
    (0): Sequential(
      (conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )

  ======================== upsample1(卷积 + 上采样)========================
  (upsample1): Upsample(
    (upsample): Sequential(
      (0): Sequential(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU6(inplace=True)
      )
      (1): Upsample(scale_factor=2.0, mode=nearest)
    )
  )

  ======================== 对中间那个特征层的卷积操作 ========================
  (conv_for_P4): Sequential(
    (conv): Conv2d(96, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU6(inplace=True)
  )

  ======================== 深度可分离卷积 * 5 ========================
  (make_five_conv1): Sequential(
    (0): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (4): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )

  ======================== upsample2(卷积 + 上采样)========================
  (upsample2): Upsample(
    (upsample): Sequential(
      (0): Sequential(
        (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU6(inplace=True)
      )
      (1): Upsample(scale_factor=2.0, mode=nearest)
    )
  )

  ======================== 对第一个特征层的卷积操作 ========================
  (conv_for_P3): Sequential(
    (conv): Conv2d(32, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU6(inplace=True)
  )

  ======================== 深度可分离卷积 * 5 ========================
  (make_five_conv2): Sequential(
    (0): Sequential(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (4): Sequential(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )

  ======================== yolo head ========================
  (yolo_head3): Sequential(
    (0): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (1): Conv2d(256, 75, kernel_size=(1, 1), stride=(1, 1))
  )

  ======================== down_sample1========================
  (down_sample1): Sequential(
    (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU6(inplace=True)
    (3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU6(inplace=True)
  )

  ======================== 深度可分离卷积 * 5 ========================
  (make_five_conv3): Sequential(
    (0): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (4): Sequential(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )
  (yolo_head2): Sequential(
    (0): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (1): Conv2d(512, 75, kernel_size=(1, 1), stride=(1, 1))
  )
  (down_sample2): Sequential(
    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256, bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU6(inplace=True)
    (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU6(inplace=True)
  )
  (make_five_conv4): Sequential(
    (0): Sequential(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (2): Sequential(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (4): Sequential(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU6(inplace=True)
    )
  )
  (yolo_head1): Sequential(
    (0): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
      (3): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU6(inplace=True)
    )
    (1): Conv2d(1024, 75, kernel_size=(1, 1), stride=(1, 1))
  )
)

I used sparsity = 0.8 L1FilterPruner to prune conv1 and conv2(two convolutional layers before and after the SPP module )

On the Voc2007 data set, the mAp before pruning was 78%. After pruning and fine-tuning 50 Epochs, the mAp gradually increased, but in the end it was only 45%.

I mean, this pruning should look reasonable, why did the mAp drop so much?

My code is as follows: I directly use model_masked.pth instead of speedup.pth for finetuning, because I want to test the effect of the pruned model first


if __name__ == "__main__":
    Cuda = True
    model = YoloBody(......)
    model_path = "logs/Epoch100-Total_Loss5.4132-Val_Loss7.7523.pth"
    print('=============== load state dict %s ===============' % model_path)
    device = torch.device('cpu')
    if Cuda == True:
        device = torch.device('cuda')
    state_dict = torch.load(model_path, map_location=device)
    model.load_state_dict(state_dict)
    if Cuda:
        model = model.cuda()

    dummy_input = torch.rand(1, 3, 416, 416).to(device)

    start = time.time()
    out = model(dummy_input)
    print('Before Pruning, the Time Latency: ', time.time() - start)

    sparsity = 0.5
    print('=============== start pruning sparsity = %s ===============' % sparsity)
    config_list = [{
        'sparsity': sparsity,
        'op_types': ['Conv2d'],
        'op_names': [
                     'conv1.0.conv',
                     'conv1.1.0',
                     'conv1.1.1',
                     'conv1.1.3',
                     'conv1.1.4',
                     'conv1.2.conv',
                     'conv2.0.conv',
                     'conv2.1.0',
                     'conv2.1.3',
                     'conv2.2.conv',
                     ]
    }]

    print('=============== start pruning sparsity = %s ===============' % sparsity)
    pruner = L1FilterPruner(model, config_list)
    model = pruner.compress()
    pruner.export_model(model_path='logs/pruner_pth/model_masked.pth',
                        mask_path='logs/pruner_pth/mask')

    start = time.time()
    out = model(dummy_input)
    print('After Pruning, the Time Latency: ', time.time() - start)

    print('===============  fine tuning ===============')
    finetune_model = YoloBody(......)
    state_dict = torch.load('logs/pruner_pth/model_masked.pth')
    finetune_model.load_state_dict(state_dict)

    lr = 1e-3
    optimizer = torch.optim.Adam(finetune_model.parameters(), lr)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.92)
    yolo_loss = YOLOLoss(......)

    Batch_size = 8
    train_dataset = YoloDataset(......)
    val_dataset = YoloDataset(......)
    gen = DataLoader(train_dataset, ......)
    gen_val = DataLoader(val_dataset, ......)

    epoch_size = num_train // Batch_size
    epoch_size_val = num_val // Batch_size

    start_epoch = 0
    end_epoch = 50
    for epoch in range(start_epoch, end_epoch):
        pruner.update_epoch(epoch)
        fit_one_epoch(finetune_model, yolo_loss, epoch, epoch_size, epoch_size_val, gen, gen_val, end_epoch, Cuda, optimizer)
        lr_scheduler.step()

Thank you boss

J-shang commented 3 years ago

Hello @Veal98 , the export model is only used for inference, the model weights corresponding to the mask are set as 0 directly. When you finetune the model, these 0s may change.

If you want to finetune the model to test the mAP, you can directly finetune after model = pruner.compress(). At this time, the model is wrapped, and the mask will multiply with weight when training. you can print the model after model = pruner.compress() to see what nni do to the original model.

And we recommend applying the finetuning on the speedup model, not the wrapped model for a more realistic effect.

Veal98 commented 3 years ago

Hello @Veal98 , the export model is only used for inference, the model weights corresponding to the mask are set as 0 directly. When you finetune the model, these 0s may change.

If you want to finetune the model to test the mAP, you can directly finetune after model = pruner.compress(). At this time, the model is wrapped, and the mask will multiply with weight when training. you can print the model after model = pruner.compress() to see what nni do to the original model.

And we recommend applying the finetuning on the speedup model, not the wrapped model for a more realistic effect.

thanks, and i'm trying now✊

and still have a question , if i finetuning on the speedup model, the .pth it generated cannot be loaded into the original network for inference

J-shang commented 3 years ago

it's by design, the speedup model is truly smaller because the structure is changed, the masked weight is removed, so you can't load the state_dict to the original model. You should save the entire speedup model, not only the state_dict.

torch.save(speedup_model, './speed_up_model.pth')
model = torch.load('./speed_up_model.pth')
Veal98 commented 3 years ago

hi now my code is there, but the mAp is still 45%, what I want to know is the model after Speedup and the model after pruning. Will the accuracy of the former be better?


input_shape = (416, 416)

yolo_loss = YOLOLoss(......)

def trainer(pruner, net, yolo_loss, epoch, epoch_size, epoch_size_val, gen, genval, Epoch, cuda, optimizer, loss_history):
    net.train()
    with tqdm(total=epoch_size, desc=f'Epoch {epoch + 1}/{Epoch}', postfix=dict, mininterval=0.3) as pbar:
          ......

    net.eval()

    print('Start Validation')
    with tqdm(total=epoch_size_val, desc=f'Epoch {epoch + 1}/{Epoch}', postfix=dict, mininterval=0.3) as pbar:
          ......

    if epoch == 0 or (epoch + 1) % 10 == 0:
        pruner.export_model(
            model_path='pruner_pth/test/pruned_mask_Epoch%d-Total_Loss%.4f-Val_Loss%.4f.pth' % ((epoch + 1), total_loss / (epoch_size + 1), val_loss / (epoch_size_val + 1)),
            mask_path='pruner_pth/test/mask.pth')

if __name__ == "__main__":

    model = YoloBody(......)
    model_path = "logs/Epoch100-Total_Loss5.4132-Val_Loss7.7523.pth"
    device = torch.device('cpu')
    if Cuda == True:
        device = torch.device('cuda')
    state_dict = torch.load(model_path, map_location=device)
    model.load_state_dict(state_dict)
    if Cuda:
        model = model.cuda()

    dummy_input = torch.rand(1, 3, 416, 416).to(device)

    lr = 1e-3
    optimizer = torch.optim.Adam(model.parameters(), lr)

    sparsity = 0.5
    config_list = [{
        'sparsity': sparsity,
        'op_types': ['Conv2d'],
        'op_names': [
            'conv1.0.conv',
            'conv1.1.0',
            'conv1.1.1',
            'conv1.1.3',
            'conv1.1.4',
            'conv1.2.conv',
            'conv2.0.conv',
            'conv2.1.0',
            'conv2.1.3',
            'conv2.2.conv',
            'conv_for_P4.conv',
            'make_five_conv1.0.conv',
            'make_five_conv1.1.0',
            'make_five_conv1.1.3',
            'make_five_conv1.2.conv',
            'make_five_conv1.3.0',
            'make_five_conv1.3.3',
            'make_five_conv1.4.conv',
            'conv_for_P3.conv',
            'make_five_conv2.0.conv',
            'make_five_conv2.1.0',
            'make_five_conv2.1.3',
            'make_five_conv2.2.conv',
            'make_five_conv2.3.0',
            'make_five_conv2.3.3',
            'make_five_conv2.4.conv',
            'make_five_conv3.0.conv',
            'make_five_conv3.1.0',
            'make_five_conv3.1.3',
            'make_five_conv3.2.conv',
            'make_five_conv3.3.0',
            'make_five_conv3.3.3',
            'make_five_conv3.4.conv',
            'make_five_conv4.0.conv',
            'make_five_conv4.1.0',
            'make_five_conv4.1.3',
            'make_five_conv4.2.conv',
            'make_five_conv4.3.0',
            'make_five_conv4.3.3',
            'make_five_conv4.4.conv',
        ]
    }]

    pruner = L1FilterPruner(model, config_list)
    model = pruner.compress()

    Batch_size = 8
    start_epoch = 0
    end_epoch = 50

    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.92)
    for epoch in range(start_epoch, end_epoch):
        pruner.update_epoch(epoch)
        trainer(pruner, model, ......)
        lr_scheduler.step()

Thank you boss

J-shang commented 3 years ago

this is the right way to use the pruner. if the mAP is still 45%, you can try the dependency aware mode, or try AGPPruner. maybe sparsity=0.5 is too high.

Veal98 commented 3 years ago

ok thanks, i'll try it

Veal98 commented 3 years ago

this is the right way to use the pruner. if the mAP is still 45%, you can try the dependency aware mode, or try AGPPruner. maybe sparsity=0.5 is too high.

I have tried reducing the sparsity to 0.2 and using dependency aware mode base on L1FilterPruner, regrettably, mAp has not been greatly improved, only 1~2% ↑, Can you give me some feasible suggestions for drastically increasing mAp?

Thanks very much!