pruned model size no change and inference time is even longer

microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

https://nni.readthedocs.io

MIT License

14.04k stars 1.81k forks source link

pruned model size no change and inference time is even longer #2225

Closed misslibra closed 4 years ago

misslibra commented 4 years ago

nni Environment:pytorch

nni version:1.4.1
nni mode(local|pai|remote):local
OS:ubuntu 16.04
python version:3.6
is conda or virtualenv used?: conda
is running in docker?:no I run the example code: model_prune_torch.py and the pretrain_naive model is 1.7M, the pruned_model is also 1.7M,the same with the mask. The inference time using pretrain model is 0.4ms,but for the pruned model, time increase to 1.5ms. I am so confused that what the function of the example? isn't is downscale the model and speedup? and I also try the speedup method follow the example for my model base on YOLOv3 , still the same . Please help me what is going wrong ? Thx!

QuanluZhang commented 4 years ago

@misslibra thanks for reporting this issue. It is expected that the pruned model is also 1.7M, because the pruners are responsible for finding weight masks that make the model still performs reasonably good. ModelSpeedup is responsible for making the model smaller based on the generated masks.

For your case, could you tell us how you measured the number 1.5ms? with pruner applied? or loading the saved model weight checkpoint to the original model (i.e., before pruning)? if the former, inference latency should be higher because weights should be multiplied by the masks in forward. if the latter, the inference latency should not be different.

For ModelSpeedup, it would be great if you can share the code with us, so that we can check whether your model is really compressed.

misslibra commented 4 years ago

Thanks for your support! I add measure time code in test, s_time = time.time() output = model(data) print('inference time is : ', (time.time() - s_time)*1000 ) before model pruning, time is 0.7ms, and after prune it is 1.5ms now I understand that mask multiply take time.

misslibra commented 4 years ago

if name == "main":

parser = argparse.ArgumentParser()

parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset")

parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file")

# parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")

parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file")

parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")

parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")

parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")

parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")

parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension")

parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model")

opt = parser.parse_args()

# print(opt)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}]

model = Darknet(opt.model_def, img_size=opt.img_size).to(device)

model.load_state_dict(torch.load(opt.weights_path))

if compression == 'prune':
    print('do prune')
    config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d
    pruner = L1FilterPruner(model, config_list)
    # pruner = ActivationMeanRankFilterPruner(model, config_list)
    pruner.compress()
    pruner.export_model('model.pth', 'mask.pth')

"""model inference time"""
if do_speedup_detection:
    # Get dataloader
    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
    model.eval()
    masks_file = './mask.pth'
    apply_compression_results(model, masks_file)
    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

misslibra commented 4 years ago

this is my code to use speedup

misslibra commented 4 years ago

I try to load new model exported by pruner.export_model, and use use_mask logic , inference time is still not cut down.

"""model inference time"""

if do_speedup_detection:

    print('------')

    model_1 = Darknet(opt.model_def, img_size=opt.img_size).to(device)

    model_1.load_state_dict(torch.load('model.pth'))

    # Get dataloader

    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor

    model_1.eval()

    # masks_file = './mask.pth'
    if use_mask:
       apply_compression_results(model, masks_file)
    # else:
    #     m_speedup = ModelSpeedup(model, input_imgs, masks_file)
    #     m_speedup.speedup_model()

    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model_1(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

misslibra commented 4 years ago

and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error:

/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if grid_size[0] != self.grid_size[0]: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert self.stride == self.img_dim[1] / self.grid_size[1] /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pred_boxes = FloatTensor(prediction[..., :4].shape) Traceback (most recent call last): File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in m_speedup = ModelSpeedup(model_1, input_imgs, masks_file) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init self.trace_graph = torch.jit.trace(model, dummy_input) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module check_tolerance, _force_outplace, True, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, *kwargs) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 676, in _check_trace raise TracingCheckError(diag_info) torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self : ClassType, %x.1 : Tensor): %2 : ClassType = prim::GetAttrname="module_list" %3 : ClassType = prim::GetAttrname="0" %4 : ClassType = prim::GetAttrname="conv_0"

QuanluZhang commented 4 years ago

if name == "main":

parser = argparse.ArgumentParser()

parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset")

parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file")

# parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")

parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file")

parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")

parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")

parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")

parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")

parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension")

parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model")

opt = parser.parse_args()

# print(opt)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}]

model = Darknet(opt.model_def, img_size=opt.img_size).to(device)

model.load_state_dict(torch.load(opt.weights_path))

if compression == 'prune':
    print('do prune')
    config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d
    pruner = L1FilterPruner(model, config_list)
    # pruner = ActivationMeanRankFilterPruner(model, config_list)
    pruner.compress()
    pruner.export_model('model.pth', 'mask.pth')

"""model inference time"""
if do_speedup_detection:
    # Get dataloader
    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
    model.eval()
    masks_file = './mask.pth'
    apply_compression_results(model, masks_file)
    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
        input_imgs = Variable(input_imgs.type(Tensor))
        input_imgs = input_imgs.to(device)
        with torch.no_grad():
            start = time.time()
            detections = model(input_imgs)
            durable = time.time()
            print('inference time : ', 1000*(durable - start))
        # break`

@misslibra there are two issues in your code. First, after calling pruner.compress() you should fine tune your model. pruner.compress() generates masks based on for example model weights, but it does not fine tune model for you, you still need to write fine tune logic after calling pruner.compress(). Second, apply_compression_results is expected to make inference slower if you use v1.4.1, please try master branch which would not make inference slower.

misslibra commented 4 years ago

@QuanluZhang Thank you so much ! I will try now and update the result

QuanluZhang commented 4 years ago

apply_compression_results simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.

BTW, the following two examples might help: https://github.com/microsoft/nni/blob/master/examples/model_compress/model_speedup.py https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py

misslibra commented 4 years ago

apply_compression_results simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.

My torch version is 1.3.1

QuanluZhang commented 4 years ago

and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error:

/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if grid_size[0] != self.grid_size[0]: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert self.stride == self.img_dim[1] / self.grid_size[1] /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pred_boxes = FloatTensor(prediction[..., :4].shape) Traceback (most recent call last): File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in m_speedup = ModelSpeedup(model_1, input_imgs, masks_file) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init self.trace_graph = torch.jit.trace(model, dummy_input) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module check_tolerance, _force_outplace, True, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, kwargs) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init*.py", line 676, in _check_trace raise TracingCheckError(diag_info) torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self : ClassType, %x.1 : Tensor): %2 : ClassType = prim::GetAttrname="module_list" %3 : ClassType = prim::GetAttrname="0" %4 : ClassType = prim::GetAttrname="conv_0"

looks like a bug in torch.jit, some related issues in pytorch: https://github.com/pytorch/pytorch/issues/23993 https://github.com/pytorch/pytorch/issues/33491

misslibra commented 4 years ago

and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if grid_size[0] != self.grid_size[0]: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert self.stride == self.img_dim[1] / self.grid_size[1] /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pred_boxes = FloatTensor(prediction[..., :4].shape) Traceback (most recent call last): File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in m_speedup = ModelSpeedup(model_1, input_imgs, masks_file) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init self.trace_graph = torch.jit.trace(model, dummy_input) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module check_tolerance, _force_outplace, True, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, kwargs) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init*.py", line 676, in _check_trace raise TracingCheckError(diag_info) torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self : ClassType, %x.1 : Tensor): %2 : ClassType = prim::GetAttrname="module_list" %3 : ClassType = prim::GetAttrname="0" %4 : ClassType = prim::GetAttrname="conv_0"

looks like a bug in torch.jit, some related issues in pytorch: pytorch/pytorch#23993 pytorch/pytorch#33491

this error can be solved by this notice(from source code): ModelSpeedup( model, dummy_input, masks_file) The dummy input for jit.trace, users should put it on right device before pass in

QuanluZhang commented 4 years ago

@misslibra thanks for sharing the cause.

misslibra commented 4 years ago

@QuanluZhang hi ，when I try to apply ModelSpeedup with Pytorch model nesnet50 ` import torchvision.models as models

model = models.resnet50(pretrained=False)

m_speedup = ModelSpeedup(model, input_imgs, masks_file)

m_speedup.speedup_model() ` I hit error :

File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask self.infer_module_mask(_module_name, in_shape=output_cmask) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask self.infer_module_mask(_module_name, in_shape=output_cmask) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 474, in infer_module_mask .format(m_type, module_name)) RuntimeError: Has not supported infering output shape from input shape for module/function: aten::_convolution, ResNet/Sequential[layer1]/Bottleneck[1]/Conv2d[conv1].aten::_convolution.1

and I try to add "aten::_convolution" in map : infer_from_inshape in infer_shape.py . BUT, error still happen for another item "aten::_add"...... Is it because this function not suitable for resnet ? Or I still need to do other modify?

QuanluZhang commented 4 years ago

@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.

Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?

TangChangcheng commented 4 years ago

@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.

Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?

Hello, I also encounter problems when I tried to speed up ResNet. It seems like some conflicts occur between 2 shortcuts. For example, A -> conv_bn_relu_1 -> B, out1 = A+B -> conv_bn_relu_2 -> C, out2 = out1 + C, the mask of B should be apply on out1 and out2 because of successor relationships, but it conflicts with the mask of C...

QuanluZhang commented 4 years ago

@TangChangcheng could you provide an executable python script along with the mask file you use, so that we can reproduce the problem?

QuanluZhang commented 4 years ago

@TangChangcheng @misslibra your issue may be resolved by pr @2579

suyashhchougule commented 3 years ago

hi @misslibra which pytorch version were you using for L!filterpruner ? I am having import error with pytorch 1.8.1 .

ImportError: cannot import name 'L1FilterPruner' from 'nni.compression.pytorch'