microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.06k stars 1.82k forks source link

TypeError: Invalid shape (64, 64, 1, 1) for image data #5555

Closed gkrisp98 closed 1 year ago

gkrisp98 commented 1 year ago

Environment: VScode

Hi, I am trying to prune a Face detector with this architecture:

EXTD(
  (base): ModuleList(
    (0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): PReLU(num_parameters=1)
    )
    (1): InvertedResidual_dwc(
      (conv): Sequential(
        (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): PReLU(num_parameters=1)
        (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual_dwc(
      (conv): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): PReLU(num_parameters=1)
        (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128)
        (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): PReLU(num_parameters=1)
        (6): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (3): InvertedResidual_dwc(
      (conv): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): PReLU(num_parameters=1)
        (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128)
        (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): PReLU(num_parameters=1)
        (6): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (4): InvertedResidual_dwc(
      (conv): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): PReLU(num_parameters=1)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
        (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): PReLU(num_parameters=1)
        (6): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): InvertedResidual_dwc(
      (conv): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): PReLU(num_parameters=1)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256)
        (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): PReLU(num_parameters=1)
        (6): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (upfeat): ModuleList(
    (0): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
      (1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (1): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
      (1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (2): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
      (1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (3): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
      (1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
    (4): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
      (1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU()
    )
  )
  (loc): ModuleList(
    (0): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (conf): ModuleList(
    (0): Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (softmax): Softmax(dim=-1)
)

I am using this config_list :

config_list = [{
    'sparsity_per_layer' : 0.2,
    'op_types' : ['Conv2d'],
}, {
    'exclude' : True,
    'op_names' : ['loc.0', 'loc.1', 'loc.2', 'loc.3', 'loc.4', 'loc.5',
                  'conf.0', 'conf.1', 'conf.2', 'conf.3', 'conf.4', 'conf.5',
                  ]
}]

and when I apply the pruner and try to visualize the mask I get the follownig error:

sparsity: 0.8125
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?29929b28-b222-4e75-80f2-fefedb0d1d62)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 7
      4     mask = mask['weight'].detach().cpu().numpy()
      6 print("sparsity: {}".format(mask.sum() [/](https://vscode-remote+ssh-002dremote-002b160-002e40-002e53-002e84.vscode-resource.vscode-cdn.net/) mask.size))
----> 7 plt.imshow(mask)

File [~/anaconda3/envs/gpu/lib/python3.9/site-packages/matplotlib/pyplot.py:2695](https://vscode-remote+ssh-002dremote-002b160-002e40-002e53-002e84.vscode-resource.vscode-cdn.net/m2/gkrispanis/Projects/EXTD_Pytorch-master2/~/anaconda3/envs/gpu/lib/python3.9/site-packages/matplotlib/pyplot.py:2695), in imshow(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, interpolation_stage, filternorm, filterrad, resample, url, data, **kwargs)
   2689 @_copy_docstring_and_deprecators(Axes.imshow)
   2690 def imshow(
   2691         X, cmap=None, norm=None, *, aspect=None, interpolation=None,
   2692         alpha=None, vmin=None, vmax=None, origin=None, extent=None,
   2693         interpolation_stage=None, filternorm=True, filterrad=4.0,
   2694         resample=None, url=None, data=None, **kwargs):
-> 2695     __ret = gca().imshow(
   2696         X, cmap=cmap, norm=norm, aspect=aspect,
   2697         interpolation=interpolation, alpha=alpha, vmin=vmin,
   2698         vmax=vmax, origin=origin, extent=extent,
   2699         interpolation_stage=interpolation_stage,
   2700         filternorm=filternorm, filterrad=filterrad, resample=resample,
   2701         url=url, **({"data": data} if data is not None else {}),
   2702         **kwargs)
   2703     sci(__ret)
   2704     return __ret
...
    716     # - otherwise casting wraps extreme values, hiding outliers and
    717     # making reliable interpretation impossible.
    718     high = 255 if np.issubdtype(self._A.dtype, np.integer) else 1

TypeError: Invalid shape (64, 64, 1, 1) for image data

The code I used is this:

from nni.compression.pytorch.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)
import matplotlib.pyplot as plt

for _, mask in masks.items():
    mask = mask['weight'].detach().cpu().numpy()

print("sparsity: {}".format(mask.sum() / mask.size))
plt.imshow(mask)

It is also worth noting that even though I set 'sparsity_per_layer' : 0.2, when I try to visualize the masks as you see it prints sparsity: 0.8125 . Do you know why and how I can fix this issue ?

J-shang commented 1 year ago

hello @gkrisp98 , the mask in nni, 0 means masked, 1 means not masked, so the sparsity is 1 - mask.sum() / mask.size.

gkrisp98 commented 1 year ago

hi @J-shang, thanks for your response. So my model has 80% or 20% sparsity ? the error is in the code for calculating sparsity or in the definition of sparsity ? Also, do you have any idea why I am getting the TypeError: Invalid shape (64, 64, 1, 1) for image data ?

J-shang commented 1 year ago

hello @gkrisp98 , your model has 18.25% sparsity (18.25% weights are masked), this because you set 20% sparsity, and nni will generate as close as possible to the sparse ratio you set, but not exceed it.

The TypeError is because the mask is not an image data, it is a tensor for conv2d weight, you should not use imshow to visualize it.

gkrisp98 commented 1 year ago

Thanks again for your time. I am also facing another problem when I try to unwrapp the model and make the pruning using this code:

pruner._unwrap_model()

from nni.compression.pytorch.speedup import ModelSpeedup
model.eval()
ModelSpeedup(model, torch.rand(1,3,28,28), masks).speedup_model()

The output and the error I am getting is this:

/m2/user/Projects/EXTD_Pytorch-master2/layers/functions/prior_box.py:51: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  output = torch.Tensor(mean).view(-1, 4)
/m2/user/Projects/EXTD_Pytorch-master2/layers/functions/prior_box.py:51: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  output = torch.Tensor(mean).view(-1, 4)
/m2/user/Projects/EXTD_Pytorch-master2/EXTD_64.py:186: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  self.priors = Variable(self.priorbox.forward(), volatile=True)
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
[2023-05-11 09:58:58] start to speedup the model
/home/user/anaconda3/envs/gpu/lib/python3.9/site-packages/torch/jit/_trace.py:992: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Tensor-likes are not close!

Mismatched elements: 240 / 60000 (0.4%)
Greatest absolute difference: 12.010849952697754 at index (6, 1, 4, 4) (up to 1e-05 allowed)
Greatest relative difference: 1.0 at index (0, 1, 0, 0) (up to 1e-05 allowed)
  _check_trace(
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
zero
[2023-05-11 09:58:59] infer module masks...
[2023-05-11 09:58:59] Update mask for base.0.0
[2023-05-11 09:58:59] Update mask for base.0.1
[2023-05-11 09:58:59] Update mask for base.0.2
[2023-05-11 09:58:59] Update mask for base.1.conv.0
[2023-05-11 09:58:59] Update mask for base.1.conv.1
[2023-05-11 09:58:59] Update mask for base.1.conv.2
[2023-05-11 09:58:59] Update mask for base.1.conv.3
[2023-05-11 09:58:59] Update mask for base.1.conv.4
...
[2023-05-11 09:59:00] Update mask for .aten::view.299
[2023-05-11 09:59:00] WARNING: throw some args away when calling the function "view"
[2023-05-11 09:59:00] WARNING: throw some args away when calling the function "view"
[2023-05-11 09:59:00] Update mask for .aten::max.267
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 5
      3 from nni.compression.pytorch.speedup import ModelSpeedup
      4 model.eval()
----> 5 ModelSpeedup(model, torch.rand(1,3,28,28), masks).speedup_model()

File ~/anaconda3/envs/gpu/lib/python3.9/site-packages/nni/compression/pytorch/speedup/compressor.py:546, in ModelSpeedup.speedup_model(self)
    543 fix_mask_conflict(self.masks, self.bound_model, self.dummy_input)
    545 _logger.info("infer module masks...")
--> 546 self.infer_modules_masks()
    547 _logger.info('resolve the mask conflict')
    549 # load the original stat dict before replace the model

File ~/anaconda3/envs/gpu/lib/python3.9/site-packages/nni/compression/pytorch/speedup/compressor.py:383, in ModelSpeedup.infer_modules_masks(self)
    381 curnode = visit_queue.get()
    382 # forward mask inference for curnode
--> 383 self.update_direct_sparsity(curnode)
    384 successors = self.torch_graph.find_successors(curnode.unique_name)
    385 for successor in successors:

File ~/anaconda3/envs/gpu/lib/python3.9/site-packages/nni/compression/pytorch/speedup/compressor.py:257, in ModelSpeedup.update_direct_sparsity(self, node)
    252 _auto_infer.input_debugname = input_debugname
    253 # update the mask tensor and the internal output of the submodules
    254 # after manually unpack the tuple/list of tensors, the number of the outputs
...
    258     node.outputs) == 1, 'The number of the output should be one after the Tuple unpacked manually'
    260 out_debugname = node.outputs[0]
    261 # update the output mask into self.masks

AssertionError: The number of the output should be one after the Tuple unpacked manually

Despite the error, I believe the pruning is done. Do you know why I am getting this error and if indeed the pruning is applied ?

Lijiaoa commented 1 year ago

the latest question had been added in new issue #5568, so could you close this issue? Thanks. @gkrisp98

gkrisp98 commented 1 year ago

hello @gkrisp98 , your model has 18.25% sparsity (18.25% weights are masked), this because you set 20% sparsity, and nni will generate as close as possible to the sparse ratio you set, but not exceed it.

The TypeError is because the mask is not an image data, it is a tensor for conv2d weight, you should not use imshow to visualize it.

Hi again @Lijiaoa I am trying to understand why the model does not achieve the sparsity I am giving as input. I understand what you are saying, that the 'sparsity' variable is something like an upper threshold but isn't it true that I am telling the model to prune 20% of the filters ? Why isn't it able to do so ?