Closed IECCLES4 closed 10 months ago
hi, is it because of the nms_pre_score
has zero dim in its shape? Maybe you can change input_img
when calling deploy.py
to make sure this tensor have no zero dim.
hi, is it because of the
nms_pre_score
has zero dim in its shape? Maybe you can changeinput_img
when callingdeploy.py
to make sure this tensor have no zero dim.
Hi, thank you for suggestion. Just double checking that I understood this correctly you mean the nms_pre in ssd300.py as I did check it and it's default value is 1000 so I don't think that needs changing but I could be wrong.
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.
I am having the same issue. Has anybody figured this out in the meantime?
I am having the same issue. Has anybody figured this out in the meantime?
Sorry for the late reply but if I remember correctly it was due to the latest version of MMDeploy not supporting one class or something along those lines.
Hi @RunningLeon, I am facing an issue when exporting the RetinaNet from mmdetection
https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/configs/_base_/models/retinanet_r50_fpn.py for a single class case.
The error message is attached below:
│ xxx/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models │
│ /detectors/single_stage.py:85 in single_stage_detector__forward │
│ │
│ 82 │ # set the metainfo │
│ 83 │ data_samples = _set_metainfo(data_samples, img_shape) │
│ 84 │ │
│ ❱ 85 │ return __forward_impl(self, batch_inputs, data_samples=data_samples) │
│ 86 │
│ │
│ xxx/lib/python3.10/site-packages/mmdeploy/core/optimizers/funct │
│ ion_marker.py:266 in g │
│ │
│ 263 │ │ │ args = mark_tensors(args, func, func_id, 'input', ctx, attrs, │
│ 264 │ │ │ │ │ │ │ │ is_inspect, args_level) │
│ 265 │ │ │ │
│ ❱ 266 │ │ │ rets = f(*args, **kwargs) │
│ 267 │ │ │ │
│ 268 │ │ │ ctx = Context(output_names) │
│ 269 │ │ │ func_ret = mark_tensors(rets, func, func_id, 'output', ctx, attrs, │
│ │
│ xxx/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models │
│ /detectors/single_stage.py:23 in __forward_impl │
│ │
│ 20 │ """ │
│ 21 │ x = self.extract_feat(batch_inputs) │
│ 22 │ │
│ ❱ 23 │ output = self.bbox_head.predict(x, data_samples, rescale=False) │
│ 24 │ return output │
│ 25 │
│ 26 │
│ │
│ xxx/lib/python3.10/site-packages/mmdet/models/dense_heads/base_ │
│ dense_head.py:197 in predict │
│ │
│ 194 │ │ │
│ 195 │ │ outs = self(x) │
│ 196 │ │ │
│ ❱ 197 │ │ predictions = self.predict_by_feat( │
│ 198 │ │ │ *outs, batch_img_metas=batch_img_metas, rescale=rescale) │
│ 199 │ │ return predictions │
│ 200 │
│ │
│ xxx/lib/python3.10/site-packages/mmdeploy/codebase/mmdet/models │
│ /dense_heads/base_dense_head.py:145 in base_dense_head__predict_by_feat │
│ │
│ 142 │ │ │ if self.use_sigmoid_cls: │
│ 143 │ │ │ │ max_scores, _ = nms_pre_score.max(-1) │
│ 144 │ │ │ else: │
│ ❱ 145 │ │ │ │ max_scores, _ = nms_pre_score[..., :-1].max(-1) │
│ 146 │ │ │ _, topk_inds = max_scores.topk(pre_topk) │
│ 147 │ │ │ bbox_pred, scores, score_factors = gather_topk( │
│ 148 │ │ │ │ bbox_pred, │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: max(): Expected reduction dim 2 to have non-zero size.
I modified the classification loss function to employ Cross Entropy (*type*='CrossEntropyLoss', *use_sigmoid*=True, *loss_weight*=1.0
), in such the effective bbox_head config would be as follow:
bbox_head=dict(
type="RetinaHead",
num_classes=80,
in_channels=256,
stacked_convs=4,
feat_channels=256,
anchor_generator=dict(
type="AnchorGenerator",
octave_base_scale=4,
scales_per_octave=3,
ratios=[0.5, 1.0, 2.0],
strides=[8, 16, 32, 64, 128],
),
bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type="L1Loss", loss_weight=1.0),
)
To export the model into ONNX, I called the export
function from https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/apis/onnx/export.py. Based on my understanding, before torch.onnx.export
was invoked, the model is patched with modified child modules and for this particular case, the predict_with_feat()
is replaced with base_dense_head__predict_by_feat()
in https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py#L26-L27.
After reviewing the code in https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py#L26-L27, I noticed three parts involving the use_sigmoid
flag configured in the CrossEntropyLoss, namely:
At the constructor of the RetinaHead : https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/dense_heads/anchor_head.py#L73-L78
At the base_dense_head, there is first slicing of the scores
. I presume this is to exclude the background (index num_classes
): https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py#L113-L117
This is the confusing part, there is a second round of slicing when getting the max_scores: https://github.com/open-mmlab/mmdeploy/blob/bc75c9d6c8940aa03d0e1e5b5962bd930478ba77/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py#L141-L146
I hope you could explain the reasoning behind this, as it appears that the last object class is excluded when computing the max_scores
. Thank you!
Checklist
Describe the bug
When I try training on SSD using 1 class I am getting the error
IndexError: max(): Expected reduction dim 2 to have non-zero size. 11/07 15:44:25 - mmengine - ERROR - /home/dtl-admin/dev/railsight/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 -
mmdeploy.apis.pytorch2onnx.torch2onnxwith Call id: 0 failed. exit.
This only happens with SSD and I have used other training methods which do not give any errors. I have seen another Bug report on this but the answer was just a work around and not an actual solution.Reproduction
python tools/deploy.py configs/mmdet/detection/detection_tensorrt-fp16_static-320x320.py /home/dtl-admin/dev/railsight/mmdetection/configs/ssd/ssd300_fp16_coco_2.py /home/dtl-admin/dev/railsight/mmdetection/checkpoints/2023-08-14_TPE_Trained_Model-290b0e8e.pth /home/dtl-admin/dev/railsight/mmdeploy/data/image-148.png --work-dir mmdeploy_model/ssd_fp16 --device cuda --dump-info
Environment
Error traceback