Lxp2014 commented 2 years ago

Describe the issue: Hello, I'm using NNI to prune the detection model ssdlite-mobilenetV2 in mmdetection (https://github.com/open-mmlab/mmdetection). And an error as shown in the title occurred when running the function ModelSpeedup(model, torch.rand(1, 3, 256, 256).cuda(), masks).speedup_model(). The forward process works fine while the update mask for .aten::to fails. Is there any solutions. Thanks!

Environment:

NNI version: 2.9
Training service (local|remote|pai|aml|etc):local
Client OS: centos7 linux
Server OS (for remote mode only):
Python version:3.8.13
PyTorch/TensorFlow version: torch 1.12.0+cu102
Is conda/virtualenv/venv used?:conda
Is running in Docker?:no

Configuration:

Experiment config (remember to remove secrets!):
Search space:

Log message: [2022-09-28 20:28:03] start to speedup the model [2022-09-28 20:28:07] infer module masks... [2022-09-28 20:28:07] Update mask for backbone.conv1.conv [2022-09-28 20:28:07] Update mask for .aten::to.249 Traceback (most recent call last): File "demo/image_demo.py", line 92, in main(args) File "demo/image_demo.py", line 58, in main ModelSpeedup(model, torch.rand(1, 3, 256, 256).cuda(), masks).speedup_model() File "/home/dsplxp/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py", line 543, in speedup_model self.infer_modules_masks() File "/home/dsplxp/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py", line 380, in infer_modules_masks self.update_direct_sparsity(curnode) File "/home/dsplxp/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py", line 234, in update_direct_sparsity _auto_infer = AutoMaskInference( File "/home/dsplxp/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/nni/compression/pytorch/speedup/infer_mask.py", line 80, in init self.output = self.module(dummy_input) File "/home/dsplxp/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/nni/compression/pytorch/speedup/jit_translate.py", line 244, in call result = self.func(self.positional, **self.keyword) TypeError: to() received an invalid combination of arguments - got (memory_format=NoneType, copy=bool, non_blocking=bool, pin_memory=NoneType, device=torch.device, layout=torch.layout, dtype=torch.dtype, ), but expected one of:

(torch.device device, torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
(torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
(Tensor tensor, bool non_blocking, bool copy, *, torch.memory_format memory_format)

J-shang commented 2 years ago

hello @lixiaopeng123456 , thanks for your report, this is a known issue and will be fixed in next release.

@Louis-J could you give a workaround for current nni version?

Lxp2014 commented 2 years ago

hello @lixiaopeng123456 , thanks for your report, this is a known issue and will be fixed in next release.

@Louis-J could you give a workaround for current nni version?

Thanks for your reply, looking forward to the next version.

scarlett2018 commented 2 years ago

hello @lixiaopeng123456 , thanks for your report, this is a known issue and will be fixed in next release.

@Louis-J could you give a workaround for current nni version?

@J-shang - is this target to 2.9.1 or later?

J-shang commented 2 years ago

hello @lixiaopeng123456 , thanks for your report, this is a known issue and will be fixed in next release. @Louis-J could you give a workaround for current nni version?

@J-shang - is this target to 2.9.1 or later?

Hope we could solve this in 2.9.1

Louis-J commented 2 years ago

Does the problem only occur on gpu? I can't reproduce it on cpu in ssdlite_mobilenetv2_scratch_600e_coco.

Lxp2014 commented 2 years ago

Does the problem only occur on gpu? I can't reproduce it on cpu in ssdlite_mobilenetv2_scratch_600e_coco.

Yes, the problem occurs on gpu. I will test it on cpu and get back to you. Thanks!

Lxp2014 commented 2 years ago

Does the problem only occur on gpu? I can't reproduce it on cpu in ssdlite_mobilenetv2_scratch_600e_coco.

The problem still occurs on cpu.

test code：python demo/test.py tests/data/10.jpg configs/ssd/ssdlite_mobilenetv2_scratch_600e_hand.py work_dirs/ssdlite_mobilenetv2_scratch_600e_hand/epoch_120.pth --device cpu --score-thr 0.5 My test.py： import asyncio from argparse import ArgumentParser from functools import partial from mmdet.apis import (async_inference_detector, inference_detector, init_detector, show_result_pyplot) import torch import pdb def parse_args(): parser = ArgumentParser() parser.add_argument('img', help='Image file') parser.add_argument('config', help='Config file') parser.add_argument('checkpoint', help='Checkpoint file') parser.add_argument('--out-file', default=None, help='Path to output file') parser.add_argument( '--device', default='cuda:0', help='Device used for inference') parser.add_argument( '--palette', default='coco', choices=['coco', 'voc', 'citys', 'random'], help='Color palette used for visualization') parser.add_argument( '--score-thr', type=float, default=0.3, help='bbox score threshold') parser.add_argument( '--async-test', action='store_true', help='whether to set async options for async inference.') args = parser.parse_args() return args

def main(args):

build the model from a config file and a checkpoint file

model = init_detector(args.config, args.checkpoint, device=args.device)
# test a single image
# result = inference_detector(model, args.img)

config_list = [{
    'sparsity_per_layer': 0.5,
    'op_types': ['Conv2d']
}, {
    'exclude': True,
    'op_names': ['Linear','bn']
}]
from nni.compression.pytorch.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)
# show the wrapped model structure, `PrunerModuleWrapper` have wrapped the layers that configured in the config_list.
# print(model)
# %%
# compress the model and generate the masks
_, masks = pruner.compress()
# show the masks sparsity
for name, mask in masks.items():
    print(name, ' sparsity : ', '{:.2}'.format(mask['weight'].sum() / mask['weight'].numel()))
pruner._unwrap_model()
# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
from nni.compression.pytorch.speedup import ModelSpeedup
print(model)
ModelSpeedup(model, torch.rand(1, 3, 256, 256), masks).speedup_model()

if name == 'main': args = parse_args() if args.async_test: asyncio.run(async_main(args)) else: main(args)

Louis-J commented 2 years ago

thanks, and please offer the ssdlite_mobilenetv2_scratch_600e_hand.py.

Louis-J commented 2 years ago

i didn't reproduce it on ssdlite_mobilenetv2_scratch_600e_coco, so i think only on ssdlite_mobilenetv2_scratch_600e_hand it can be reproduced

Lxp2014 commented 2 years ago

i didn't reproduce it on ssdlite_mobilenetv2_scratch_600e_coco, so i think only on ssdlite_mobilenetv2_scratch_600e_hand it can be reproduced

ssdlite_mobilenetv2_scratch_600e_hand.py is as follows：

base = [ '../base/datasets/coco_detection.py', '../base/default_runtime.py' ]

model = dict( type='SingleStageDetector', backbone=dict( type='MobileNetV2', out_indices=(4, 7), norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), neck=dict( type='SSDNeck', in_channels=(96, 1280), out_channels=(96, 1280, 512, 256, 256, 128), level_strides=(2, 2, 2, 2), level_paddings=(1, 1, 1, 1), l2_norm_scale=None, use_depthwise=True, norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), act_cfg=dict(type='ReLU6'), init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)), bbox_head=dict( type='SSDHead', in_channels=(96, 1280, 512, 256, 256, 128), num_classes=1, use_depthwise=True, norm_cfg=dict(type='BN', eps=0.001, momentum=0.03), act_cfg=dict(type='ReLU6'), init_cfg=dict(type='Normal', layer='Conv2d', std=0.001),

    anchor_generator=dict(
        type='SSDAnchorGenerator',
        scale_major=False,
        strides=[16, 32, 64, 107, 160, 320],
        ratios=[[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]],
        min_sizes=[48, 100, 150, 202, 253, 304],
        max_sizes=[100, 150, 202, 253, 304, 320]),
    bbox_coder=dict(
        type='DeltaXYWHBBoxCoder',
        target_means=[.0, .0, .0, .0],
        target_stds=[0.1, 0.1, 0.2, 0.2])),

train_cfg=dict(
    assigner=dict(
        type='MaxIoUAssigner',
        pos_iou_thr=0.5,
        neg_iou_thr=0.5,
        min_pos_iou=0.,
        ignore_iof_thr=-1,
        gt_max_assign_all=False),
    smoothl1_beta=1.,
    allowed_border=-1,
    pos_weight=-1,
    neg_pos_ratio=3,
    debug=False),
test_cfg=dict(
    nms_pre=1000,
    nms=dict(type='nms', iou_threshold=0.45),
    min_bbox_size=0,
    score_thr=0.02,
    max_per_img=200))

cudnn_benchmark = True

dataset_type = 'CocoDataset' data_root = 'data/onehand10k/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Expand', mean=img_norm_cfg['mean'], to_rgb=img_norm_cfg['to_rgb'], ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(320, 320), keep_ratio=False), dict(type='RandomFlip', flip_ratio=0.5), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=320), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(320, 320), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=320), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=72, workers_per_gpu=4, train=dict( delete=True, type='RepeatDataset', # use RepeatDataset to speed up training times=5, dataset=dict( type=dataset_type, ann_file=data_root + 'annotations/all_train.json', img_prefix=data_root, pipeline=train_pipeline)), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))

optimizer = dict(type='SGD', lr=0.015, momentum=0.9, weight_decay=4.0e-5) optimizer_config = dict(grad_clip=None)

lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=0.001, min_lr=0) runner = dict(type='EpochBasedRunner', max_epochs=120)

evaluation = dict(interval=5, metric='bbox') checkpoint_config = dict(interval=5) custom_hooks = [ dict(type='NumClassCheckHook'), dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW') ]

auto_scale_lr = dict(base_batch_size=192)

Louis-J commented 2 years ago

thanks. i'll try it

Lxp2014 commented 2 years ago

i didn't reproduce it on ssdlite_mobilenetv2_scratch_600e_coco,

Thanks, I will also test it on ssdlite_mobilenetv2_scratch_600e_coco. If it is ok, i will check the difference.

Lxp2014 commented 2 years ago

thanks. i'll try it

The problem occurs on ssdlite_mobilenetv2_scratch_600e_coco. Can you provide your code and environment? I wonder if there is something wrong with my test code. Thanks!

Louis-J commented 2 years ago

sorry, I can't reproduce the issue both on cpu and gpu in both ssdlite_mobilenetv2_scratch_600e_coco and ssdlite_mobilenetv2_scratch_600e_hhand.

I think the point difference comes comes from elsewhere. could you please add the code below and show me the result?

code:

    print('type(model.backbone.conv1):', type(model.backbone.conv1))
    print('model.backbone.conv1:', model.backbone.conv1)
    conv1_in_dummy = torch.randn(8,3,256,256)
    conv1_out_dummy = model.backbone.conv1(conv1_in_dummy)
    print('conv1_out_dummy.shape:', conv1_out_dummy.shape)
    traced_conv1 = torch.jit.trace(model.backbone.conv1, conv1_in_dummy)
    print('traced_conv1.graph:', traced_conv1.graph)
    torch._C._jit_pass_inline(traced_conv1.graph)
    print('traced_conv1.graph after inline:', traced_conv1.graph)

position: between pruner._unwrap_model() and ModelSpeedup(model, torch.rand(1, 3, 256, 256), masks).speedup_model()

what I got is a torchscript graph without 'aten::to' in model.backbone.conv1. I want to know which layer the 'aten::to' comes from.

thanks.

Lxp2014 commented 2 years ago

sorry, I can't reproduce the issue both on cpu and gpu in both ssdlite_mobilenetv2_scratch_600e_coco and ssdlite_mobilenetv2_scratch_600e_hhand.

I think the point difference comes comes from elsewhere. could you please add the code below and show me the result?

code:
    print('type(model.backbone.conv1):', type(model.backbone.conv1))
    print('model.backbone.conv1:', model.backbone.conv1)
    conv1_in_dummy = torch.randn(8,3,256,256)
    conv1_out_dummy = model.backbone.conv1(conv1_in_dummy)
    print('conv1_out_dummy.shape:', conv1_out_dummy.shape)
    traced_conv1 = torch.jit.trace(model.backbone.conv1, conv1_in_dummy)
    print('traced_conv1.graph:', traced_conv1.graph)
    torch._C._jit_pass_inline(traced_conv1.graph)
    print('traced_conv1.graph after inline:', traced_conv1.graph)
position: between pruner._unwrap_model() and ModelSpeedup(model, torch.rand(1, 3, 256, 256), masks).speedup_model()

what I got is a torchscript graph without 'aten::to' in model.backbone.conv1. I want to know which layer the 'aten::to' comes from.

thanks.

Louis-J commented 2 years ago

thanks a lot.

the graph code of model.backbone.conv1 is totally equal to my graph code. the bad 'aten::to' isn't here, still don't konw where the 'aten::to' is from.

could you please add the code below and show me the result? if the result is too long you can upload the output text file.

code:

    traced_model = torch.jit.trace(model, torch.rand(1, 3, 256, 256))
    torch._C._jit_pass_inline(traced_model.graph)
    print('traced_model.graph has aten::to:', 'aten::to' in str(traced_model.graph))
    if 'aten::to' in str(traced_model.graph):
        print('traced_model.graph after inline:', str(traced_model.graph))
    exit()

position: between pruner._unwrap_model() and ModelSpeedup(model, torch.rand(1, 3, 256, 256), masks).speedup_model()

Louis-J commented 2 years ago

and i try to write a fix in https://github.com/Louis-J/nni/blob/fix_5148/nni/compression/pytorch/speedup/jit_translate.py. please replace your local jit_translate.py and try it. i think it can solve the 'aten::to' problem.

Lxp2014 commented 2 years ago

and i try to write a fix in https://github.com/Louis-J/nni/blob/fix_5148/nni/compression/pytorch/speedup/jit_translate.py. please replace your local jit_translate.py and try it. i think it can solve the 'aten::to' problem.

Thanks a lot! The problem has been solved.

microsoft / nni

Pruning: TypeError: to() received an invalid combination of arguments #5148

build the model from a config file and a checkpoint file