open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.61k stars 603 forks source link

Tensor size mismatch #500

Closed saffie91 closed 2 years ago

saffie91 commented 2 years ago

Hello, I am trying to convert a CascadeNet pth model into onnx and getting the same error:

root@e2e4892b0ebf:~/workspace/mmdeploy# python ./tools/deploy.py configs/mmdet/detection/detection_onnxruntime_dynamic.py cascade_mask_rcnn_hrnetv2p_w32_20e.py General.Model.table.detection.v2.pth demo.png [2022-05-19 16:21:23.549] [mmdeploy] [info] Register 'DirectoryModel' 2022-05-19 16:21:23,618 - mmdeploy - INFO - torch2onnx start. [2022-05-19 16:21:24.866] [mmdeploy] [info] Register 'DirectoryModel' /opt/conda/lib/python3.7/site-packages/mmdet/models/builder.py:53: UserWarning: train_cfg and test_cfg is deprecated, please specify them in model 'please specify them in model', UserWarning) load checkpoint from local path: General.Model.table.detection.v2.pth /opt/conda/lib/python3.7/site-packages/mmdet/datasets/utils.py:69: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file. 'data pipeline in your config file.', UserWarning) 2022-05-19 16:21:32,430 - mmdeploy - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 2022-05-19:16:21:32,mmdeploy WARNING [utils.py:92] DeprecationWarning: get_onnx_config will be deprecated in the future. /root/workspace/mmdeploy/mmdeploy/core/optimizers/function_marker.py:158: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! ys_shape = tuple(int(s) for s in ys.shape) /opt/conda/lib/python3.7/site-packages/torch/nn/functional.py:3455: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode) /opt/conda/lib/python3.7/site-packages/mmdet/models/dense_heads/anchor_head.py:123: UserWarning: DeprecationWarning: anchor_generator is deprecated, please use "prior_generator" instead warnings.warn('DeprecationWarning: anchor_generator is deprecated, ' /opt/conda/lib/python3.7/site-packages/mmdet/core/anchor/anchor_generator.py:333: UserWarning: grid_anchors would be deprecated soon. Please use grid_priors warnings.warn('grid_anchors would be deprecated soon. ' /opt/conda/lib/python3.7/site-packages/mmdet/core/anchor/anchor_generator.py:370: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors 'single_level_grid_anchors would be deprecated soon. ' /root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/rpn_head.py:78: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert cls_score.size()[-2:] == bbox_pred.size()[-2:] /root/workspace/mmdeploy/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. k = torch.tensor(k, device=input.device, dtype=torch.long) /root/workspace/mmdeploy/mmdeploy/pytorch/functions/topk.py:33: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return ctx.origin_func(input, k, dim=dim, largest=largest, sorted=sorted) /opt/conda/lib/python3.7/site-packages/mmdet/core/bbox/coder/legacy_delta_xywh_bbox_coder.py:77: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert pred_bboxes.size(0) == bboxes.size(0) 2022-05-19:16:21:35,root ERROR [utils.py:43] The size of tensor a (4) must match the size of tensor b (4432) at non-singleton dimension 2 Traceback (most recent call last): File "/root/workspace/mmdeploy/mmdeploy/utils/utils.py", line 38, in target_wrapper result = target(*args, kwargs) File "/root/workspace/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 113, in torch2onnx output_file=output_file) File "/root/workspace/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 55, in torch2onnx_impl verbose=verbose) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/init.py", line 276, in export custom_opsets, enable_onnx_checker, use_external_data_format) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export use_external_data_format=use_external_data_format) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 698, in _export dynamic_axes=dynamic_axes) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 456, in _model_to_graph use_new_jit_passes) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 417, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py", line 377, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py", line 1139, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py", line 130, in forward self._force_outplace, File "/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py", line 116, in wrapper outs.append(self.inner(trace_inputs)) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 887, in _call_impl result = self._slow_forward(input, kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 860, in _slow_forward result = self.forward(*input, *kwargs) File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 371, in wrapper return self.func(self, args, kwargs) File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 69, in base_detectorforward return forward_impl(ctx, self, img, img_metas=img_metas, kwargs) File "/root/workspace/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g rets = f(args, kwargs) File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 28, in __forward_impl return self.simple_test(img, img_metas, kwargs) File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 371, in wrapper return self.func(self, args, kwargs) File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/two_stage.py", line 58, in two_stage_detectorsimpletest proposals, = self.rpn_head.simple_test_rpn(x, img_metas) File "/opt/conda/lib/python3.7/site-packages/mmdet/models/dense_heads/dense_test_mixins.py", line 130, in simple_test_rpn proposal_list = self.get_bboxes(rpn_outs, img_metas=img_metas) File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 371, in wrapper return self.func(self, args, **kwargs) File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/rpn_head.py", line 122, in rpn_headget_bboxes max_shape=img_metas[0]['img_shape']) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/bbox/coder/legacy_delta_xywh_bbox_coder.py", line 79, in decode self.stds, max_shape, wh_ratio_clip) File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/parrots_jit.py", line 22, in wrapper_inner return func(*args, *kargs) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/bbox/coder/legacy_delta_xywh_bbox_coder.py", line 181, in legacy_delta2bbox denorm_deltas = deltas stds + means RuntimeError: The size of tensor a (4) must match the size of tensor b (4432) at non-singleton dimension 2 2022-05-19 16:21:36,282 - mmdeploy - ERROR - torch2onnx failed.

Does anyone know what the problem might be? Thanks in advance!

saffie91 commented 2 years ago

Also this is my env:

2022-05-19 17:05:42,004 - mmdeploy - INFO - **Environmental information** 2022-05-19 17:05:43,280 - mmdeploy - INFO - sys.platform: linux 2022-05-19 17:05:43,280 - mmdeploy - INFO - Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] 2022-05-19 17:05:43,280 - mmdeploy - INFO - CUDA available: False 2022-05-19 17:05:43,280 - mmdeploy - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 2022-05-19 17:05:43,280 - mmdeploy - INFO - PyTorch: 1.8.0+cpu 2022-05-19 17:05:43,280 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:

2022-05-19 17:05:43,280 - mmdeploy - INFO - TorchVision: 0.9.0+cpu 2022-05-19 17:05:43,280 - mmdeploy - INFO - OpenCV: 4.5.3-openvino 2022-05-19 17:05:43,281 - mmdeploy - INFO - MMCV: 1.4.0 2022-05-19 17:05:43,281 - mmdeploy - INFO - MMCV Compiler: GCC 7.3 2022-05-19 17:05:43,281 - mmdeploy - INFO - MMCV CUDA Compiler: not available 2022-05-19 17:05:43,281 - mmdeploy - INFO - MMDeploy: 0.4.0+69111a6 2022-05-19 17:05:43,281 - mmdeploy - INFO -

2022-05-19 17:05:43,281 - mmdeploy - INFO - **Backend information** [2022-05-19 17:05:43.444] [mmdeploy] [info] Register 'DirectoryModel' 2022-05-19 17:05:43,511 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True 2022-05-19 17:05:43,512 - mmdeploy - INFO - tensorrt: None ops_is_avaliable : False 2022-05-19 17:05:43,513 - mmdeploy - INFO - ncnn: 1.0.20220519 ops_is_avaliable : True 2022-05-19 17:05:43,514 - mmdeploy - INFO - pplnn_is_avaliable: False 2022-05-19 17:05:43,515 - mmdeploy - INFO - openvino_is_avaliable: True 2022-05-19 17:05:43,515 - mmdeploy - INFO -

2022-05-19 17:05:43,516 - mmdeploy - INFO - **Codebase information** 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmdet: 2.20.0 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmseg: 0.21.1 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmcls: 0.19.0 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmocr: 0.4.1 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmedit: 0.14.0 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmdet3d: None 2022-05-19 17:05:43,519 - mmdeploy - INFO - mmpose: 0.25.1

RunningLeon commented 2 years ago

@saffie91 Hi,

  1. since it's cascade mask rcnn, you should use deploy config configs/mmdet/instance-seg/instance-seg_onnxruntime_dynamic.py.
  2. maybe you should try with standard config and ckpt: model from mmdet
  3. You could try to rewrite legacy_delta2bbox by referring to here. BTW, LegacyDeltaXYWHBBoxCoder is used in mmdet 1.x as docstring suggested, better not to use this config.
saffie91 commented 2 years ago

Hey @RunningLeon ,

Thank you so much for responding so quickly.

1- Yes this should have been done, however it did not fix the problem. 2- This model and config together works. However I am trying to convert a certain pretrained model to onnx.

The model I'm trying to deploy is a model that was trained on v1 and converted to v2 here:

https://github.com/iiLaurens/CascadeTabNet

This is the config file for it: https://github.com/iiLaurens/CascadeTabNet/blob/mmdet2x/Config/cascade_mask_rcnn_hrnetv2p_w32_20e.py The model infers on pytorch properly with this config file.

3- How can I rewrite the config file so that the deployment works on mmdeploy?

RunningLeon commented 2 years ago

Hi,

  1. maybe you could try to replace LegacyDeltaXYWHBBoxCoder and LegacyAnchorGenerator in your config with DeltaXYWHBBoxCoder and AnchorGenerator.
saffie91 commented 2 years ago

Hi,

I've done that and also I have changed the nms configs to the newer ones and the conversion finished!

However I'm not sure if its done correctly. When I want to test the model, I get this:

[array([[[0., 0., 0., 0., 0.]]], dtype=float32), array([[0]], dtype=int64), array([[[[0.43053702, 0.4700691 , 0.4150678 , 0.43690863, 0.41104928, 0.43695334, 0.38587412, 0.4188225 , 0.38462797, 0.41543972, 0.3846365 , 0.4148415 , 0.38547826, 0.41506618, 0.38623217, 0.41657138, 0.38640213, 0.41864142, 0.38612762, 0.4211859 , 0.38812378, 0.4222332 , 0.402057 , 0.42669973, 0.41785023, 0.43111444, 0.45813686, 0.44016734], [0.44209534, 0.4738578 , 0.44360596, 0.44261935, 0.42809704, 0.44258595, 0.4076243 , 0.42793846, 0.3967991 , 0.42342475, 0.3970493 , 0.42324722, 0.39799762, 0.42374483, 0.3993029 , 0.42429757, 0.40135008, 0.42528364, 0.4039778 , 0.42648715, 0.4079728 , 0.4263706 , 0.42059427, 0.43405983, 0.42350683, 0.4299352 , 0.46647525, 0.4577396 ], [0.4302987 , 0.4587028 , 0.41853014, 0.41144058, 0.39825395, 0.4076334 , 0.39672142, 0.40306062, 0.38737032, 0.39649394, 0.3863439 , 0.3971483 , 0.3851831 , 0.39834756, 0.38405788, 0.3990221 , 0.3830788 , 0.40070623, 0.3815433 , 0.4021379 , 0.38467398, 0.40481853, 0.39861095, 0.40930966, 0.40668666, 0.41653967, 0.45532364, 0.43083206], [0.42065603, 0.4560346 , 0.39559767, 0.41677675, 0.38260972, 0.40951148, 0.37653583, 0.40284836, 0.36720324, 0.3978488 , 0.36758447, 0.39808536, 0.36840796, 0.39848912, 0.36806303, 0.3984903 , 0.36851323, 0.39989537, 0.36874425, 0.40089208, 0.37854296, 0.39968947, 0.3874395 , 0.4140844 , 0.40182298, 0.40799013, 0.45109445, 0.44170228], [0.43873337, 0.4586644 , 0.43411183, 0.41505966, 0.40423074, 0.39335752, 0.4040061 , 0.38825387, 0.39588368, 0.38111305, 0.39573485, 0.38177913, 0.3959187 , 0.3817221 , 0.39660674, 0.3813737 , 0.39722985, 0.38279518, 0.39651197, 0.38501891, 0.39664936, 0.3906386 , 0.42136687, 0.4097179 , 0.41233146, 0.4056385 , 0.46056247, 0.43147376], [0.41343468, 0.4526695 , 0.39658284, 0.40998784, 0.3737275 , 0.38139465, 0.37548608, 0.38904914, 0.37324995, 0.38897192, 0.37361705, 0.38935906, 0.37434062, 0.39023015, 0.37527868, 0.39111102, 0.3770796 , 0.39476195, 0.37896574, 0.3984926 , 0.38988066, 0.39890766, 0.42007476, 0.42961583, 0.41067445, 0.40686163, 0.45419708, 0.44438943], [0.4434961 , 0.46559736, 0.43611524, 0.41380578, 0.41075397, 0.39381504, 0.41819835, 0.39758503, 0.4121369 , 0.39514557, 0.4115594 , 0.39581487, 0.41147497, 0.39655614, 0.41162825, 0.3981285 , 0.41181102, 0.40077162, 0.41178143, 0.40271497, 0.4100336 , 0.40599573, 0.43663707, 0.42346355, 0.42113286, 0.4123433 , 0.47183996, 0.42968658], [0.42528182, 0.46645984, 0.40015122, 0.4123261 , 0.37577307, 0.38797274, 0.38401383, 0.4001119 , 0.37836564, 0.40194607, 0.3797058 , 0.401863 , 0.3824429 , 0.402637 , 0.3851903 , 0.40341803, 0.3888183 , 0.40588036, 0.3924897 , 0.4090727 , 0.3994843 , 0.40633625, 0.43075362, 0.43334433, 0.40874237, 0.4031781 , 0.45659328, 0.4385049 ], [0.43991032, 0.46357113, 0.43255132, 0.41797486, 0.40901184, 0.3956195 , 0.41712037, 0.40202644, 0.41065347, 0.39544633, 0.41024506, 0.39561358, 0.41063237, 0.39673507, 0.41147164, 0.3979584 , 0.4114287 , 0.39918151, 0.41094682, 0.40167874, 0.40907627, 0.4039473 , 0.4362448 , 0.4229809 , 0.42125627, 0.40792137, 0.47408196, 0.4279567 ], [0.4200793 , 0.46275064, 0.39630157, 0.4072114 , 0.3770283 , 0.3859141 , 0.3805274 , 0.39483398, 0.3734706 , 0.39554018, 0.3745681 , 0.39536214, 0.37668562, 0.3955894 , 0.3778028 , 0.3949384 , 0.3798424 , 0.39591482, 0.3824863 , 0.39915097, 0.39058974, 0.3999275 , 0.4238182 , 0.42698842, 0.3982598 , 0.39298695, 0.45265928, 0.43963242], [0.4396397 , 0.46386918, 0.43260634, 0.4181186 , 0.4089864 , 0.39506626, 0.41717747, 0.40078443, 0.41053236, 0.39443678, 0.41014445, 0.39449123, 0.41038096, 0.3953959 , 0.4111784 , 0.39620823, 0.41097492, 0.3973531 , 0.41019797, 0.4000786 , 0.40813887, 0.4029677 , 0.43592575, 0.4224925 , 0.42091554, 0.40727898, 0.47401205, 0.42757756], [0.42027894, 0.463217 , 0.3971635 , 0.4075599 , 0.37756422, 0.3853629 , 0.3801511 , 0.39346725, 0.37327752, 0.3933347 , 0.3742109 , 0.3931288 , 0.3759963 , 0.3934408 , 0.37711298, 0.3925638 , 0.37900358, 0.39365387, 0.38186797, 0.3972711 , 0.39066714, 0.39827466, 0.42372075, 0.42528683, 0.39878535, 0.39308283, 0.45338893, 0.43885657], [0.44011343, 0.4647376 , 0.4324568 , 0.41987193, 0.4086898 , 0.39509514, 0.417529 , 0.40006906, 0.4110484 , 0.39433935, 0.41035298, 0.39395574, 0.41026866, 0.3942053 , 0.41091746, 0.3941866 , 0.4106838 , 0.39570028, 0.40951788, 0.39882582, 0.40756848, 0.4027217 , 0.43440303, 0.4223231 , 0.41983187, 0.4081893 , 0.47405043, 0.4271984 ], [0.42117268, 0.46394873, 0.39921716, 0.40895024, 0.37991506, 0.38572738, 0.38094065, 0.39208087, 0.3737614 , 0.39179224, 0.3745659 , 0.3914541 , 0.37618375, 0.39150602, 0.37725496, 0.39084288, 0.3789647 , 0.39207977, 0.3818766 , 0.39598745, 0.39136052, 0.39708287, 0.42360532, 0.4239833 , 0.3996306 , 0.3943961 , 0.45361286, 0.43874386], [0.44082576, 0.4658231 , 0.4319387 , 0.42152828, 0.4082632 , 0.39568448, 0.41874945, 0.40125507, 0.41184402, 0.3955915 , 0.41109744, 0.3954389 , 0.41072732, 0.39555943, 0.41101673, 0.39506117, 0.41066116, 0.39671463, 0.4088887 , 0.40026993, 0.4067995 , 0.40431207, 0.4328788 , 0.42317235, 0.4174986 , 0.41071516, 0.47412333, 0.42803836], [0.4222479 , 0.4644515 , 0.40196902, 0.41020244, 0.3831696 , 0.3872384 , 0.38333052, 0.39277253, 0.37557346, 0.3927607 , 0.37633753, 0.39282763, 0.3773275 , 0.39263678, 0.3780648 , 0.3919568 , 0.3806212 , 0.3936278 , 0.3832228 , 0.39731914, 0.39280483, 0.39857495, 0.4250641 , 0.42488846, 0.40129578, 0.39737934, 0.45335278, 0.4398947 ], [0.4413778 , 0.4669084 , 0.43103763, 0.42267904, 0.40851268, 0.39768213, 0.419419 , 0.40283093, 0.41312554, 0.3981627 , 0.4123555 , 0.3981188 , 0.41205615, 0.3982947 , 0.4120875 , 0.39817786, 0.41142592, 0.40018323, 0.40886235, 0.4036648 , 0.40566656, 0.40645915, 0.43143445, 0.42354748, 0.41544166, 0.4118936 , 0.47379574, 0.42934576], [0.42315125, 0.4652614 , 0.40693766, 0.4123187 , 0.38785508, 0.3896643 , 0.38597313, 0.39586398, 0.37743646, 0.39594102, 0.37805644, 0.39608654, 0.3787824 , 0.39592692, 0.3797702 , 0.3952343 , 0.3824421 , 0.39671075, 0.38590258, 0.40044987, 0.39448515, 0.4009332 , 0.42799628, 0.42643213, 0.40363833, 0.40070522, 0.45399827, 0.44177252], [0.44189963, 0.46830648, 0.43110877, 0.42422664, 0.41007614, 0.40200883, 0.4220768 , 0.40680495, 0.41689748, 0.4028059 , 0.4161693 , 0.40279132, 0.4154674 , 0.4030197 , 0.41550606, 0.40417907, 0.41416168, 0.4063627 , 0.41142324, 0.40956247, 0.4078787 , 0.4090755 , 0.43360707, 0.42373836, 0.41489303, 0.4141786 , 0.47352394, 0.43012318], [0.42418858, 0.46545744, 0.41147918, 0.4154184 , 0.39398837, 0.39392367, 0.39100572, 0.4017722 , 0.38280806, 0.40236318, 0.38308764, 0.40209705, 0.38335508, 0.4019069 , 0.38442332, 0.40163118, 0.3870733 , 0.40279236, 0.39102143, 0.40669045, 0.39767694, 0.40446383, 0.432429 , 0.42778316, 0.4074597 , 0.40376383, 0.45520723, 0.44204238], [0.4429057 , 0.46475002, 0.42737392, 0.42013645, 0.40770373, 0.40226024, 0.4220102 , 0.41274375, 0.41968757, 0.40756077, 0.4183885 , 0.4070564 , 0.41747764, 0.40705466, 0.41691312, 0.4082733 , 0.41617 , 0.4113238 , 0.414703 , 0.41340172, 0.4110765 , 0.4129337 , 0.4369214 , 0.4243974 , 0.4189285 , 0.42028597, 0.48141283, 0.43414468], [0.4271208 , 0.4639762 , 0.41201276, 0.41197255, 0.40065396, 0.3928807 , 0.39977735, 0.4053381 , 0.39434087, 0.40768617, 0.39422137, 0.4069395 , 0.39449418, 0.40665835, 0.3956161 , 0.40617466, 0.3977267 , 0.40673906, 0.40044802, 0.4094853 , 0.40467545, 0.40689725, 0.43629628, 0.42755085, 0.4178558 , 0.41099712, 0.45740685, 0.4449437 ], [0.44984245, 0.46064055, 0.43502566, 0.41993743, 0.4274264 , 0.41334605, 0.44929475, 0.42107707, 0.4466961 , 0.41716367, 0.44600582, 0.41718113, 0.44483238, 0.41659346, 0.44437397, 0.4168585 , 0.4447103 , 0.41994387, 0.44517648, 0.42390755, 0.4483833 , 0.4272697 , 0.46127883, 0.43002945, 0.44629622, 0.43182904, 0.49496695, 0.45268297], [0.42850918, 0.4554308 , 0.421017 , 0.419477 , 0.4285014 , 0.41185567, 0.4351617 , 0.42747372, 0.4337274 , 0.4323771 , 0.43340868, 0.43235767, 0.43298757, 0.43166393, 0.43312913, 0.43012086, 0.43381023, 0.43008578, 0.43370017, 0.4311161 , 0.43951198, 0.4300037 , 0.455173 , 0.44331676, 0.43868065, 0.42327932, 0.47718072, 0.46580184], [0.46336353, 0.46240237, 0.45341238, 0.4332088 , 0.46105406, 0.42042693, 0.47385064, 0.4254709 , 0.4716801 , 0.41766837, 0.47099736, 0.41652724, 0.47027552, 0.4149143 , 0.4691958 , 0.41390246, 0.46802196, 0.41693026, 0.46634203, 0.42233264, 0.46587396, 0.42537874, 0.47442618, 0.43929437, 0.45262167, 0.43912277, 0.4900883 , 0.45286947], [0.43623856, 0.4625839 , 0.42942655, 0.42450035, 0.43476185, 0.4241773 , 0.44828337, 0.4467256 , 0.44797283, 0.44984537, 0.44698244, 0.44997782, 0.4460364 , 0.450812 , 0.44504783, 0.45089895, 0.4438187 , 0.45093742, 0.44198462, 0.45079875, 0.4426927 , 0.44619527, 0.46962032, 0.45636037, 0.444387 , 0.43536487, 0.46026775, 0.46073496], [0.46444383, 0.47092888, 0.44844925, 0.4409516 , 0.45800817, 0.43082446, 0.4656028 , 0.4382884 , 0.46919543, 0.43478352, 0.4693422 , 0.4348265 , 0.46885028, 0.43383434, 0.46819925, 0.43338937, 0.4675993 , 0.4332553 , 0.4661112 , 0.4317334 , 0.46034384, 0.43001804, 0.45962685, 0.44850853, 0.44911578, 0.44356734, 0.46794695, 0.46802536], [0.44065768, 0.4486454 , 0.4249637 , 0.41945988, 0.41755125, 0.42266935, 0.43344092, 0.42740327, 0.43670997, 0.42959383, 0.4367106 , 0.42919338, 0.43702576, 0.42827934, 0.43733945, 0.42689237, 0.43709642, 0.42525196, 0.43553528, 0.423496 , 0.43255526, 0.41984192, 0.440408 , 0.41674933, 0.43221936, 0.4109208 , 0.4466709 , 0.44581306]]]], dtype=float32)]

as my output. I dont think its working right, and I dont understand the 28x28 array on the output, Its just supposed to have a Nx5 for the bounding boxes and a mask of 800x800. Was there a conversion mistake?

Another thing I noticed, is that dynamic and static versions give the same output.

Thanks in advance. Appreciate the quick replies.

RunningLeon commented 2 years ago

Hi, have tested changed config with PyTorch? is the performance OK?

saffie91 commented 2 years ago

@RunningLeon hi, yes it runs perfectly on pytorch with this config file.

RunningLeon commented 2 years ago

@saffie91 Hi, by default export_postprocess_mask=False, the grid_sample is not exported to ONNX, which means shape of mask is nx28x28. If you set export_postprocess_mask=True, then you would get mask in shape of N x H x W.

https://github.com/open-mmlab/mmdeploy/blob/de3f18fbb28ebec67d2382085c52d766056c1657/configs/mmdet/_base_/base_instance-seg_static.py#L4

saffie91 commented 2 years ago

@RunningLeon hey, I have done the conversion again using export_postprocess_mask as true, however I am getting this error when I try to infer from the new onnx model:

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from newonnxmodel.onnx failed:Fatal error: grid_sampler is not a registered function/op

Should I be using dynamic or static for this conversion?

Also I keep getting this warning every time: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

RunningLeon commented 2 years ago

@saffie91 Hi, grid_sample is a custom op for onnxruntime. You should build custom ops by referring to https://mmdeploy.readthedocs.io/en/latest/backends/onnxruntime.html#build-custom-ops

saffie91 commented 2 years ago

@RunningLeon Hi, I have tried doing this and did not get any errors: [ 90%] Linking CXX shared library ../../../lib/libmmdeploy_onnxruntime_ops.so [ 91%] Building CXX object csrc/net/ort/CMakeFiles/mmdeploy_ort_net.dir/ort_net.cpp.o [ 91%] Built target mmdeploy_onnxruntime_ops [ 92%] Linking CXX shared library ../../../lib/libmmdeploy_ort_net.so [ 92%] Built target mmdeploy_ort_net Consolidate compiler generated dependencies of target mmdeploy_python [ 93%] Linking CXX shared module ../../../lib/mmdeploy_python.cpython-37m-x86_64-linux-gnu.so [100%] Built target mmdeploy_python

Afterwards I converted the model again but I keep getting the same error.

RunningLeon commented 2 years ago

@RunningLeon Hi, please post here how you load onnx and run inference with onnxruntime. If you are running with your own python script, you have to load custom lib session_options.register_custom_ops_library(ort_custom_op_path), just like following:

https://github.com/open-mmlab/mmdeploy/blob/d16720b1271ce899a72da3b74ec820ee7f8973ff/mmdeploy/backend/onnxruntime/wrapper.py#L41-L51

saffie91 commented 2 years ago

@RunningLeon Hi, Yes I have made it work using that as session options.

However I am encountering a new error now:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'Add_88' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:505 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 585 by 586

Does this mean there was an error converting?

RunningLeon commented 2 years ago

@saffie91 Hi, could be possible since you use the deploy config of dynamic shape.

  1. Could check onnx graph using netron and figure out where Add_881 operation is in the original python code, change it if possible. This may need a longer time for debugging.
  2. Could use a static config configs/mmdet/instance-seg/instance-seg_onnxruntime_static.py if input images could be preprocessed to a fixed shape.
saffie91 commented 2 years ago

@RunningLeon Hi, The static model works :) thanks for all your help!

saffie91 commented 2 years ago

@RunningLeon Hi again,

Is there a way to quantize/prune this onnx model, I am having similar errors because of the custom op.

RunningLeon commented 2 years ago

@saffie91 Hi, quantization or pruning won't remove the custom op grid_sample. If you do not want to include grid_sample in your onnx model, you can set export_postprocess_mask=False in the config as mentioned before.

@saffie91 Hi, by default export_postprocess_mask=False, the grid_sample is not exported to ONNX, which means shape of mask is nx28x28. If you set export_postprocess_mask=True, then you would get mask in shape of N x H x W.

https://github.com/open-mmlab/mmdeploy/blob/de3f18fbb28ebec67d2382085c52d766056c1657/configs/mmdet/_base_/base_instance-seg_static.py#L4

saffie91 commented 2 years ago

@RunningLeon Hello, what about pytorch_half_pixel? getting this error when trying to convert this onnx into a tf model.

RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.

is there an opset version or a config variable I can change to fix this issue?

RunningLeon commented 2 years ago

@saffie91 Hi, we do not support onnx2tf in this repo. But the version converter from onnx may be helpful for you: https://github.com/onnx/onnx/blob/main/docs/VersionConverter.md.

saffie91 commented 2 years ago

@RunningLeon thanks, I will give it a try.

RunningLeon commented 2 years ago

@saffie91 Hi, if the issue is solved, you could kindly close it. Thanks.