Closed misslibra closed 4 years ago
@misslibra thanks for reporting this issue. It is expected that the pruned model is also 1.7M, because the pruners are responsible for finding weight masks that make the model still performs reasonably good. ModelSpeedup is responsible for making the model smaller based on the generated masks.
For your case, could you tell us how you measured the number 1.5ms? with pruner applied? or loading the saved model weight checkpoint to the original model (i.e., before pruning)? if the former, inference latency should be higher because weights should be multiplied by the masks in forward. if the latter, the inference latency should not be different.
For ModelSpeedup, it would be great if you can share the code with us, so that we can check whether your model is really compressed.
Thanks for your support!
I add measure time code in test,
s_time = time.time()
output = model(data)
print('inference time is : ', (time.time() - s_time)*1000 )
before model pruning, time is 0.7ms, and after prune it is 1.5ms
now I understand that mask multiply take time.
`
if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset")
parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file")
# parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file")
parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold")
parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension")
parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model")
opt = parser.parse_args()
# print(opt)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}]
model = Darknet(opt.model_def, img_size=opt.img_size).to(device)
model.load_state_dict(torch.load(opt.weights_path))
if compression == 'prune':
print('do prune')
config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d
pruner = L1FilterPruner(model, config_list)
# pruner = ActivationMeanRankFilterPruner(model, config_list)
pruner.compress()
pruner.export_model('model.pth', 'mask.pth')
"""model inference time"""
if do_speedup_detection:
# Get dataloader
dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),
batch_size=opt.batch_size,
shuffle=False,
num_workers=opt.n_cpu,
)
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
model.eval()
masks_file = './mask.pth'
apply_compression_results(model, masks_file)
for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
input_imgs = Variable(input_imgs.type(Tensor))
input_imgs = input_imgs.to(device)
with torch.no_grad():
start = time.time()
detections = model(input_imgs)
durable = time.time()
print('inference time : ', 1000*(durable - start))
# break`
this is my code to use speedup
I try to load new model exported by pruner.export_model, and use use_mask logic , inference time is still not cut down.
`
"""model inference time"""
if do_speedup_detection:
print('------')
model_1 = Darknet(opt.model_def, img_size=opt.img_size).to(device)
model_1.load_state_dict(torch.load('model.pth'))
# Get dataloader
dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),
batch_size=opt.batch_size,
shuffle=False,
num_workers=opt.n_cpu,
)
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
model_1.eval()
# masks_file = './mask.pth'
if use_mask:
apply_compression_results(model, masks_file)
# else:
# m_speedup = ModelSpeedup(model, input_imgs, masks_file)
# m_speedup.speedup_model()
for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
input_imgs = Variable(input_imgs.type(Tensor))
input_imgs = input_imgs.to(device)
with torch.no_grad():
start = time.time()
detections = model_1(input_imgs)
durable = time.time()
print('inference time : ', 1000*(durable - start))
# break`
and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if grid_size[0] != self.grid_size[0]:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert self.stride == self.img_dim[1] / self.grid_size[1]
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pred_boxes = FloatTensor(prediction[..., :4].shape)
Traceback (most recent call last):
File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in
`
if name == "main":
parser = argparse.ArgumentParser() parser.add_argument("--image_folder", type=str, default="data/demo_data/image_ori/", help="path to dataset") parser.add_argument("--model_def", type=str, default="config/geely_yolo3d.cfg", help="path to model definition file") # parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file") parser.add_argument("--class_path", type=str, default="data/geely.names", help="path to class label file") parser.add_argument("--conf_thres", type=float, default=0.5, help="object confidence threshold") parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression") parser.add_argument("--batch_size", type=int, default=1, help="size of the batches") parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation") parser.add_argument("--img_size", type=int, default=(192, 640), help="size of each image dimension") parser.add_argument("--weights_path", type=str, default='./weights/best_model_Epoch_1060_step_619624_mAP_0.7210_lr_0.0001', help="path to checkpoint model") opt = parser.parse_args() # print(opt) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # config_list = [{'sparsity': 1, 'op_types': ['Conv2d']}] model = Darknet(opt.model_def, img_size=opt.img_size).to(device) model.load_state_dict(torch.load(opt.weights_path)) if compression == 'prune': print('do prune') config_list = [{ 'sparsity': 0.2, 'op_types': ['default'] }]##Conv2d pruner = L1FilterPruner(model, config_list) # pruner = ActivationMeanRankFilterPruner(model, config_list) pruner.compress() pruner.export_model('model.pth', 'mask.pth') """model inference time""" if do_speedup_detection: # Get dataloader dataloader = DataLoader( ImageFolder(opt.image_folder, img_size=opt.img_size), batch_size=opt.batch_size, shuffle=False, num_workers=opt.n_cpu, ) Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor model.eval() masks_file = './mask.pth' apply_compression_results(model, masks_file) for batch_i, (img_paths, input_imgs) in enumerate(dataloader): input_imgs = Variable(input_imgs.type(Tensor)) input_imgs = input_imgs.to(device) with torch.no_grad(): start = time.time() detections = model(input_imgs) durable = time.time() print('inference time : ', 1000*(durable - start)) # break`
@misslibra there are two issues in your code. First, after calling pruner.compress()
you should fine tune your model. pruner.compress()
generates masks based on for example model weights, but it does not fine tune model for you, you still need to write fine tune logic after calling pruner.compress()
. Second, apply_compression_results
is expected to make inference slower if you use v1.4.1, please try master branch which would not make inference slower.
@QuanluZhang Thank you so much ! I will try now and update the result
apply_compression_results
simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.
BTW, the following two examples might help: https://github.com/microsoft/nni/blob/master/examples/model_compress/model_speedup.py https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py
apply_compression_results
simply multiplies generated masks to weights, it does not speedup model inference. ModelSpeedup does, but ModelSpeedup is still in Alpha release, it only supports torch 1.3.1, please refer to https://nni.readthedocs.io/en/latest/Compressor/ModelSpeedup.html for details.
My torch version is 1.3.1
and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error:
/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if grid_size[0] != self.grid_size[0]: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert self.stride == self.img_dim[1] / self.grid_size[1] /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pred_boxes = FloatTensor(prediction[..., :4].shape) Traceback (most recent call last): File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in m_speedup = ModelSpeedup(model_1, input_imgs, masks_file) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init self.trace_graph = torch.jit.trace(model, dummy_input) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module check_tolerance, _force_outplace, True, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, kwargs) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init*.py", line 676, in _check_trace raise TracingCheckError(diag_info) torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self : ClassType, %x.1 : Tensor): %2 : ClassType = prim::GetAttrname="module_list" %3 : ClassType = prim::GetAttrname="0" %4 : ClassType = prim::GetAttrname="conv_0"
looks like a bug in torch.jit, some related issues in pytorch: https://github.com/pytorch/pytorch/issues/23993 https://github.com/pytorch/pytorch/issues/33491
and how to use ModelSpeedup to get a smaller model ? when I use ModelSpeedup, I get error: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if grid_size[0] != self.grid_size[0]: /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert self.stride == self.img_dim[1] / self.grid_size[1] /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:262: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors]) /home/cindy/Documents/3D/training/camera/geely_yolo3D_02/models.py:299: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pred_boxes = FloatTensor(prediction[..., :4].shape) Traceback (most recent call last): File "/home/cindy/Documents/3D/training/camera/geely_yolo3D_02/model_compression.py", line 111, in m_speedup = ModelSpeedup(model_1, input_imgs, masks_file) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 91, in init self.trace_graph = torch.jit.trace(model, dummy_input) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init.py", line 1007, in trace_module check_tolerance, _force_outplace, True, _module_class) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, kwargs) File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/torch/jit/init*.py", line 676, in _check_trace raise TracingCheckError(diag_info) torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self : ClassType, %x.1 : Tensor): %2 : ClassType = prim::GetAttrname="module_list" %3 : ClassType = prim::GetAttrname="0" %4 : ClassType = prim::GetAttrname="conv_0"
looks like a bug in torch.jit, some related issues in pytorch: pytorch/pytorch#23993 pytorch/pytorch#33491
this error can be solved by this notice(from source code):
ModelSpeedup( model, dummy_input, masks_file)
The dummy input for jit.trace
, users should put it on right device before pass in
@misslibra thanks for sharing the cause.
@QuanluZhang hi ,when I try to apply ModelSpeedup with Pytorch model nesnet50 ` import torchvision.models as models
model = models.resnet50(pretrained=False)
m_speedup = ModelSpeedup(model, input_imgs, masks_file)
m_speedup.speedup_model() ` I hit error :
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask
self.infer_module_mask(_module_name, in_shape=output_cmask)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 496, in infer_module_mask
self.infer_module_mask(_module_name, in_shape=output_cmask)
File "/home/cindy/anaconda3/envs/python36_env/lib/python3.6/site-packages/nni/compression/speedup/torch/compressor.py", line 474, in infer_module_mask
.format(m_type, module_name))
RuntimeError: Has not supported infering output shape from input shape for module/function: aten::_convolution
, ResNet/Sequential[layer1]/Bottleneck[1]/Conv2d[conv1].aten::_convolution.1
and I try to add "aten::_convolution" in map : infer_from_inshape in infer_shape.py . BUT, error still happen for another item "aten::_add"...... Is it because this function not suitable for resnet ? Or I still need to do other modify?
@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.
Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?
@misslibra ModelSpeedup relies on shape inference of operations to figure out what modules should be replaced and how. In the current alpha release, we only support limited operations/modules for shape inference. We are working on simplifying the process and interface for adding new operation/module support, will be included in future release.
Specifically for the error you encountered, seems like induced by a bug that has been already fixed. Could you pull the latest master branch, source install and try ModelSpeedup again?
Hello, I also encounter problems when I tried to speed up ResNet. It seems like some conflicts occur between 2 shortcuts. For example, A
-> conv_bn_relu_1
-> B
, out1 = A+B
-> conv_bn_relu_2
-> C
, out2 = out1 + C
, the mask of B should be apply on out1 and out2 because of successor relationships, but it conflicts with the mask of C...
@TangChangcheng could you provide an executable python script along with the mask file you use, so that we can reproduce the problem?
@TangChangcheng @misslibra your issue may be resolved by pr @2579
hi @misslibra which pytorch version were you using for L!filterpruner ? I am having import error with pytorch 1.8.1 .
ImportError: cannot import name 'L1FilterPruner' from 'nni.compression.pytorch'
nni Environment:pytorch