SpeedUp Issue With yolort Models

chandan-wiai commented 1 year ago

Describe the bug: After pruning, I am trying to speedup the model using ModelSpeedup(model, dummy_input, masks).speedup_model() The model class I am using is from yolort and has a transform attribute in it. It throws the following error:

/usr/local/lib/python3.8/dist-packages/torch/tensor.py:587: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  warnings.warn('Iterating over a tensor might cause the trace to be incorrect. '
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:3524: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  (torch.floor((input.size(i + 2).float() * torch.tensor(scale_factors[i], dtype=torch.float32)).float()))
/usr/local/lib/python3.8/dist-packages/yolort/models/anchor_utils.py:46: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  anchors = torch.as_tensor(self.anchor_grids, dtype=torch.float32, device=device).to(dtype=dtype)
/usr/local/lib/python3.8/dist-packages/yolort/models/anchor_utils.py:47: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
/usr/local/lib/python3.8/dist-packages/yolort/models/box_head.py:406: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
Input tensor shape torch.Size([8, 3, 640, 640])
Input tensor shape torch.Size([8, 3, 640, 640])
Input tensor shape torch.Size([8, 3, 640, 640])
[2023-02-11 14:21:11] start to speedup the model
start to speedup the model
Input tensor shape torch.Size([8, 3, 640, 640])
Input tensor shape torch.Size([8, 3, 640, 640])
Input tensor shape torch.Size([8, 3, 640, 640])
{'network.model.backbone.body.0.conv': 1, 'network.model.backbone.body.1.conv': 1, 'network.model.backbone.body.2.cv1.conv': 1, 'network.model.backbone.body.2.m.0.cv1.conv': 1, 'network.model.backbone.body.2.m.0.cv2.conv': 1, 'network.model.backbone.body.2.m.1.cv1.conv': 1, 'network.model.backbone.body.2.m.1.cv2.conv': 1, 'network.model.backbone.body.2.m.2.cv1.conv': 1, 'network.model.backbone.body.2.m.2.cv2.conv': 1, 'network.model.backbone.body.2.cv2.conv': 1, 'network.model.backbone.body.2.cv3.conv': 1, 'network.model.backbone.body.3.conv': 1, 'network.model.backbone.body.4.cv1.conv': 1, 'network.model.backbone.body.4.m.0.cv1.conv': 1, 'network.model.backbone.body.4.m.0.cv2.conv': 1, 'network.model.backbone.body.4.m.1.cv1.conv': 1, 'network.model.backbone.body.4.m.1.cv2.conv': 1, 'network.model.backbone.body.4.m.2.cv1.conv': 1, 'network.model.backbone.body.4.m.2.cv2.conv': 1, 'network.model.backbone.body.4.m.3.cv1.conv': 1, 'network.model.backbone.body.4.m.3.cv2.conv': 1, 'network.model.backbone.body.4.m.4.cv1.conv': 1, 'network.model.backbone.body.4.m.4.cv2.conv': 1, 'network.model.backbone.body.4.m.5.cv1.conv': 1, 'network.model.backbone.body.4.m.5.cv2.conv': 1, 'network.model.backbone.body.4.cv2.conv': 1, 'network.model.backbone.body.4.cv3.conv': 1, 'network.model.backbone.body.5.conv': 1, 'network.model.backbone.body.6.cv1.conv': 1, 'network.model.backbone.body.6.m.0.cv1.conv': 1, 'network.model.backbone.body.6.m.0.cv2.conv': 1, 'network.model.backbone.body.6.m.1.cv1.conv': 1, 'network.model.backbone.body.6.m.1.cv2.conv': 1, 'network.model.backbone.body.6.m.2.cv1.conv': 1, 'network.model.backbone.body.6.m.2.cv2.conv': 1, 'network.model.backbone.body.6.m.3.cv1.conv': 1, 'network.model.backbone.body.6.m.3.cv2.conv': 1, 'network.model.backbone.body.6.m.4.cv1.conv': 1, 'network.model.backbone.body.6.m.4.cv2.conv': 1, 'network.model.backbone.body.6.m.5.cv1.conv': 1, 'network.model.backbone.body.6.m.5.cv2.conv': 1, 'network.model.backbone.body.6.m.6.cv1.conv': 1, 'network.model.backbone.body.6.m.6.cv2.conv': 1, 'network.model.backbone.body.6.m.7.cv1.conv': 1, 'network.model.backbone.body.6.m.7.cv2.conv': 1, 'network.model.backbone.body.6.m.8.cv1.conv': 1, 'network.model.backbone.body.6.m.8.cv2.conv': 1, 'network.model.backbone.body.6.cv2.conv': 1, 'network.model.backbone.body.6.cv3.conv': 1, 'network.model.backbone.body.7.conv': 1, 'network.model.backbone.body.8.cv1.conv': 1, 'network.model.backbone.body.8.m.0.cv1.conv': 1, 'network.model.backbone.body.8.m.0.cv2.conv': 1, 'network.model.backbone.body.8.m.1.cv1.conv': 1, 'network.model.backbone.body.8.m.1.cv2.conv': 1, 'network.model.backbone.body.8.m.2.cv1.conv': 1, 'network.model.backbone.body.8.m.2.cv2.conv': 1, 'network.model.backbone.body.8.cv2.conv': 1, 'network.model.backbone.body.8.cv3.conv': 1, 'network.model.backbone.pan.inner_blocks.0.cv1.conv': 1, 'network.model.backbone.pan.inner_blocks.0.cv2.conv': 1, 'network.model.backbone.pan.inner_blocks.1.conv': 1, 'network.model.backbone.pan.inner_blocks.3.cv1.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.0.cv1.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.0.cv2.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.1.cv1.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.1.cv2.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.2.cv1.conv': 1, 'network.model.backbone.pan.inner_blocks.3.m.2.cv2.conv': 1, 'network.model.backbone.pan.inner_blocks.3.cv2.conv': 1, 'network.model.backbone.pan.inner_blocks.3.cv3.conv': 1, 'network.model.backbone.pan.inner_blocks.4.conv': 1, 'network.model.backbone.pan.layer_blocks.0.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.0.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.0.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.1.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.1.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.2.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.0.m.2.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.0.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.0.cv3.conv': 1, 'network.model.backbone.pan.layer_blocks.1.conv': 1, 'network.model.backbone.pan.layer_blocks.2.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.0.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.0.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.1.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.1.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.2.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.2.m.2.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.2.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.2.cv3.conv': 1, 'network.model.backbone.pan.layer_blocks.3.conv': 1, 'network.model.backbone.pan.layer_blocks.4.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.0.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.0.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.1.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.1.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.2.cv1.conv': 1, 'network.model.backbone.pan.layer_blocks.4.m.2.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.4.cv2.conv': 1, 'network.model.backbone.pan.layer_blocks.4.cv3.conv': 1, 'network.model.head.head.0': 1, 'network.model.head.head.1': 1, 'network.model.head.head.2': 1}
dim0 sparsity: 0.796875
dim1 sparsity: 0.000000
Dectected conv prune dim" 0
[2023-02-11 14:22:29] infer module masks...
infer module masks...
[2023-02-11 14:22:29] Update mask for network.transform
Update mask for network.transform
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 3>()
      1 mod._modules = mod._modules['network']._modules 
      2 # mod._modules['network']._modules.pop('transform')
----> 3 ModelSpeedup(mod, torch.randn(1, 3, 640, 640), masks).speedup_model()

File /usr/local/lib/python3.8/dist-packages/nni/compression/pytorch/speedup/compressor.py:536, in ModelSpeedup.speedup_model(self)
    533 fix_mask_conflict(self.masks, self.bound_model, self.dummy_input)
    535 _logger.info("infer module masks...")
--> 536 self.infer_modules_masks()
    537 _logger.info('resolve the mask conflict')
    539 # load the original stat dict before replace the model

File /usr/local/lib/python3.8/dist-packages/nni/compression/pytorch/speedup/compressor.py:371, in ModelSpeedup.infer_modules_masks(self)
    369 curnode = visit_queue.get()
    370 # forward mask inference for curnode
--> 371 self.update_direct_sparsity(curnode)
    372 successors = self.torch_graph.find_successors(curnode.unique_name)
    373 for successor in successors:

File /usr/local/lib/python3.8/dist-packages/nni/compression/pytorch/speedup/compressor.py:244, in ModelSpeedup.update_direct_sparsity(self, node)
    241 self.auto_inferences[unique_name] = _auto_infer
    242 _auto_infer.name = node.unique_name
--> 244 _auto_infer.update_direct_sparsity()
    245 # also save the input debug names into the auto_infer
    246 _auto_infer.input_debugname = input_debugname

File /usr/local/lib/python3.8/dist-packages/nni/compression/pytorch/speedup/infer_mask.py:336, in AutoMaskInference.update_direct_sparsity(self)
    334 constant = []
    335 for tout in out:
--> 336     _mask, _constant = self.isconstants(tout.clone().detach())
    337     out_mask.append(_mask)
    338     constant.append(_constant)

AttributeError: 'NestedTensor' object has no attribute 'clone'

To me, it looks like NNI can't support 'Transform' type layers. First of all is this observation correct? If yes, is there a way I can bypass this error? Thanks for the help.

Environment:

NNI version: 2.8
Training service (local|remote|pai|aml|etc): remote (V100 GPUs)
Python version: 3.8
PyTorch version: 1.8.1+cu111
Cpu or cuda version: 11.1

Reproduce the problem

Code|Example:
How to reproduce:

zhiqwang commented 1 year ago

Hi @chandan-wiai , could you try to test the main branch of yolort, we cleanup the implementation of NestedTensor at https://github.com/zhiqwang/yolov5-rt-stack/pull/482 .

chandan-wiai commented 1 year ago

Hi @chandan-wiai , could you try to test the main branch of yolort, we cleanup the implementation of NestedTensor at https://github.com/zhiqwang/yolov5-rt-stack/pull/482 .

I actually install yolort as: pip install yolort Now if the main branch is changed, how do I get the latest changes in my package? Is there a new version now that I should update yolort to?

chandan-wiai commented 1 year ago

If I do this, would it overwrite the existing version with the updated 'main' branch?

Or from Source

# clone yolort repository locally
git clone https://github.com/zhiqwang/yolov5-rt-stack.git
cd yolov5-rt-stack
# install in editable mode
pip install -e .

zhiqwang commented 1 year ago

yep, and you can pip uninstall yolort first.

chandan-wiai commented 1 year ago

@zhiqwang I tried with the main branch. But it still gives the same error.

zhiqwang commented 1 year ago

Thanks @chandan-wiai , got it, I guess that's caused by the dynamic shape in YOLOTransform, maybe we should remove this module from model builder, and we provide a method to load the vanilla YOLO module as following:

from yolort.models import YOLO

model = YOLO.load_from_yolov5(
    checkpoint_path,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
    version="r6.0",
)

model = model.eval()

chandan-wiai commented 1 year ago

we provide a method to load the vanilla YOLO module as following:

from yolort.models import YOLO

model = YOLO.load_from_yolov5( checkpoint_path, score_thresh=score_thresh, nms_thresh=nms_thresh, version="r6.0", )

model = model.eval()

Does this mean, I can take my checkpoint trained on yolort.models.YOLOv5 and load as shown above, and that model object won't have 'transform' module?

zhiqwang commented 1 year ago

Does this mean, I can take my checkpoint trained on yolort.models.YOLOv5 and load as shown above, and that model object won't have 'transform' module?

Hi @chandan-wiai , I guess not in this scenario. There are no parameters or buffers in YOLOTransform modules, it should be easy theoretically. Maybe we should build the model as following from this api:

from yolort.models.yolo import yolov5_darknet_pan_s_r60  # aka yolov5s

model = yolov5_darknet_pan_s_r60()  # we do not specify pretrained=True, i.e. do not load default weights
model.load_state_dict(torch.load('checkpoint_from_yolort.pt'))
model.eval()

We can also discuss this ticket at https://github.com/zhiqwang/yolov5-rt-stack/issues/484 so as not to disturb more people for questions not related to nni.

Lijiaoa commented 1 year ago

@chandan-wiai @zhiqwang Thanks for raising this issue and discussing so much details in here. And had the problem finally been resolved? Could you close the issue? Expect your reply.

chandan-wiai commented 1 year ago

Yes @Lijiaoa, this issue can be closed here.

microsoft / nni

SpeedUp Issue With yolort Models #5345