[Bug] Dyhead convert tensorrt fail in mmdeploy 1.2

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

Try to convert pth to tensorrt for Dyhead, but have the below error: No importer registered for op: Xor. Attempting to import as plugin

When I use mmdeploy 0.x with mmdet 2.x, the conversion was success

Reproduction

python3 tools/deploy.py /workdir/mmdeploy-workdir/detection_onnx_static_1024x1024.py /workdir/atss_swin-l-p4-w12_fpn_dyhead_mstrain_2x_coco_original.yaml /workdir/swin_large_patch4_window12_384_22k.pth /workdir/mmdeploy-workdir/test-deploy-img-1024.jpg --work-dir /workdir/mmdeploy-workdir --device cuda --dump-info

detection_onnx_static_1024x1024.py is like this

codebase_config = dict(
    type='mmdet', 
    task='ObjectDetection', 
    model_type='end2end',
    post_processing=dict(
        score_threshold=0.05,
        confidence_threshold=0.05,  # for YOLOv3
        iou_threshold=0.6,
        max_output_boxes_per_class=200,
        pre_top_k=5000,
        keep_top_k=100,
        background_label_id=-1,
    )
)

onnx_config = dict(
    type='onnx',
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='dyhead_swin_1024.onnx',
    input_names=['input'],
    output_names=['dets', 'labels'],
    input_shape=[1024, 1024],
    optimize=True,
)

backend_config = dict(
        type='tensorrt', 
        common_config=dict(fp16_mode=False, max_workspace_size=8 << 30),
        model_inputs=[
            dict(
                input_shapes=dict(
                    input=dict(min_shape=[1, 3, 1024, 1024],
                        opt_shape=[1, 3, 1024, 1024],
                        max_shape=[1, 3, 1024, 1024]
                    )
                )
            )
        ]
)

/workdir/atss_swin-l-p4-w12_fpn_dyhead_mstrain_2x_coco_original.yaml as below

auto_scale_lr:
  base_batch_size: 16
  enable: false
data_root: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510
dataset_type: CocoDataset
default_scope: mmdet
env_cfg:
  cudnn_benchmark: false
  dist_cfg:
    backend: nccl
  mp_cfg:
    mp_start_method: fork
    opencv_num_threads: 0
load_from: null
log_level: INFO
log_processor:
  by_epoch: true
  type: LogProcessor
  window_size: 50
model:
  backbone:
    attn_drop_rate: 0.0
    convert_weights: true
    depths:
    - 2
    - 2
    - 18
    - 2
    drop_path_rate: 0.2
    drop_rate: 0.0
    embed_dims: 192
    init_cfg:
      checkpoint: https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth
      type: Pretrained
    mlp_ratio: 4
    num_heads:
    - 6
    - 12
    - 24
    - 48
    out_indices:
    - 1
    - 2
    - 3
    patch_norm: true
    pretrain_img_size: 384
    qk_scale: null
    qkv_bias: true
    type: SwinTransformer
    window_size: 12
    with_cp: false
  bbox_head:
    anchor_generator:
      center_offset: 0.5
      octave_base_scale: 8
      ratios:
      - 1.0
      scales_per_octave: 1
      strides:
      - 8
      - 16
      - 32
      - 64
      - 128
      type: AnchorGenerator
    bbox_coder:
      target_means:
      - 0.0
      - 0.0
      - 0.0
      - 0.0
      target_stds:
      - 0.1
      - 0.1
      - 0.2
      - 0.2
      type: DeltaXYWHBBoxCoder
    feat_channels: 256
    in_channels: 256
    loss_bbox:
      loss_weight: 2.0
      type: GIoULoss
    loss_centerness:
      loss_weight: 1.0
      type: CrossEntropyLoss
      use_sigmoid: true
    loss_cls:
      alpha: 0.25
      gamma: 2.0
      loss_weight: 1.0
      type: FocalLoss
      use_sigmoid: true
    num_classes: 80
    pred_kernel_size: 1
    stacked_convs: 0
    type: ATSSHead
  data_preprocessor:
    bgr_to_rgb: true
    mean:
    - 123.675
    - 116.28
    - 103.53
    pad_size_divisor: 128
    std:
    - 58.395
    - 57.12
    - 57.375
    type: DetDataPreprocessor
  neck:
  - add_extra_convs: on_output
    in_channels:
    - 384
    - 768
    - 1536
    num_outs: 5
    out_channels: 256
    start_level: 0
    type: FPN
  - in_channels: 256
    num_blocks: 6
    out_channels: 256
    type: DyHead
    zero_init_offset: false
  test_cfg:
    max_per_img: 100
    min_bbox_size: 0
    nms:
      iou_threshold: 0.6
      type: nms
    nms_pre: 1000
    score_thr: 0.05
  train_cfg:
    allowed_border: -1
    assigner:
      topk: 9
      type: ATSSAssigner
    debug: false
    pos_weight: -1
  type: ATSS
optim_wrapper:
  clip_grad: null
  optimizer:
    betas:
    - 0.9
    - 0.999
    lr: 5.0e-05
    type: AdamW
    weight_decay: 0.05
  paramwise_cfg:
    custom_keys:
      absolute_pos_embed:
        decay_mult: 0
      norm:
        decay_mult: 0
      relative_position_bias_table:
        decay_mult: 0
  type: OptimWrapper
param_scheduler:
- begin: 0
  by_epoch: false
  end: 500
  start_factor: 0.001
  type: LinearLR
- begin: 0
  by_epoch: true
  end: 20
  gamma: 0.1
  milestones:
  - 16
  - 19
  type: MultiStepLR
resume: false
test_cfg:
  type: TestLoop
test_dataloader:
  batch_size: 1
  dataset:
    ann_file: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/labels/coco.json
    data_prefix:
      img: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/images
    data_root: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510
    filter_cfg:
      filter_empty_gt: false
    pipeline:
    - type: LoadImageFromFile
    - backend: pillow
      keep_ratio: true
      scale: !!python/tuple
      - 1024
      - 1024
      type: Resize
    - type: LoadAnnotations
      with_bbox: true
    - meta_keys:
      - img_id
      - img_path
      - ori_shape
      - img_shape
      - scale_factor
      type: PackDetInputs
    test_mode: true
    type: CocoDataset
  drop_last: false
  num_workers: 2
  persistent_workers: true
  sampler:
    shuffle: false
    type: DefaultSampler
test_evaluator:
  ann_file: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/labels/coco.json
  format_only: false
  metric: bbox
  type: CocoMetric
train_cfg:
  max_epochs: 20
  type: EpochBasedTrainLoop
  val_interval: 1
train_dataloader:
  batch_sampler:
    type: AspectRatioBatchSampler
  batch_size: 1
  dataset:
    dataset:
      ann_file: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/train/labels/coco.json
      data_prefix:
        img: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/train/images
      data_root: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510
      filter_cfg:
        filter_empty_gt: false
        min_size: 32
      pipeline:
      - type: LoadImageFromFile
      - type: LoadAnnotations
        with_bbox: true
      - backend: pillow
        keep_ratio: true
        scale:
        - !!python/tuple
          - 1024
          - 896
        - !!python/tuple
          - 1024
          - 1024
        type: RandomResize
      - prob: 0.5
        type: RandomFlip
      - max_rotate_degree: 15
        max_shear_degree: 2
        max_translate_ratio: 0.1
        scaling_ratio_range:
        - 0.9
        - 1.2
        type: RandomAffine
      - bbox_params:
          filter_lost_elements: true
          format: pascal_voc
          label_fields:
          - gt_bboxes_labels
          - gt_ignore_flags
          min_visibility: 0
          type: BboxParams
        skip_img_without_anno: false
        transforms:
        - brightness_limit:
          - -0.05
          - 0.05
          contrast_limit:
          - -0.05
          - 0.05
          p: 0.05
          type: RandomBrightnessContrast
        - p: 0.01
          type: GaussNoise
        type: Albu
      - type: PackDetInputs
      type: CocoDataset
    times: 2
    type: RepeatDataset
  num_workers: 2
  persistent_workers: true
  sampler:
    shuffle: true
    type: DefaultSampler
val_cfg:
  type: ValLoop
val_dataloader:
  batch_size: 1
  dataset:
    ann_file: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/labels/coco.json
    data_prefix:
      img: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/images
    data_root: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510
    filter_cfg:
      filter_empty_gt: false
    pipeline:
    - type: LoadImageFromFile
    - backend: pillow
      keep_ratio: true
      scale: !!python/tuple
      - 1024
      - 1024
      type: Resize
    - type: LoadAnnotations
      with_bbox: true
    - meta_keys:
      - img_id
      - img_path
      - ori_shape
      - img_shape
      - scale_factor
      type: PackDetInputs
    test_mode: true
    type: CocoDataset
  drop_last: false
  num_workers: 2
  persistent_workers: true
  sampler:
    shuffle: false
    type: DefaultSampler
val_evaluator:
  ann_file: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/val/labels/coco.json
  format_only: false
  metric: bbox
  type: CocoMetric
vis_backends:
- type: LocalVisBackend
visualizer:
  name: visualizer
  type: DetLocalVisualizer
  vis_backends:
  - type: LocalVisBackend
work_dir: /mnt/data/auto-train/exp/64995bf62b2712d9636e9510/run
test_pipeline:
- type: LoadImageFromFile
- backend: pillow
  keep_ratio: true
  scale: !!python/tuple
  - 1024
  - 1024
  type: Resize
- type: LoadAnnotations
  with_bbox: true
- meta_keys:
  - img_id
  - img_path
  - ori_shape
  - img_shape
  - scale_factor
  type: PackDetInputs

/workdir/swin_large_patch4_window12_384_22k.pth is https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22k.pth

/workdir/mmdeploy-workdir/test-deploy-img-1024.jpg is just any radom image size 1024*1024

Environment

07/06 02:18:59 - mmengine - INFO - TorchVision: 0.14.0
07/06 02:18:59 - mmengine - INFO - OpenCV: 4.8.0
07/06 02:18:59 - mmengine - INFO - MMEngine: 0.7.4
07/06 02:18:59 - mmengine - INFO - MMCV: 2.0.0rc4
07/06 02:18:59 - mmengine - INFO - MMCV Compiler: GCC 9.3
07/06 02:18:59 - mmengine - INFO - MMCV CUDA Compiler: 11.6
07/06 02:18:59 - mmengine - INFO - MMDeploy: 1.2.0+ae381c8
07/06 02:18:59 - mmengine - INFO -

07/06 02:18:59 - mmengine - INFO - **********Backend information**********
07/06 02:18:59 - mmengine - INFO - tensorrt:    8.2.4.2
07/06 02:18:59 - mmengine - INFO - tensorrt custom ops: Available
07/06 02:18:59 - mmengine - INFO - ONNXRuntime: None
07/06 02:18:59 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
07/06 02:18:59 - mmengine - INFO - ONNXRuntime custom ops:      Available
07/06 02:18:59 - mmengine - INFO - pplnn:       None
07/06 02:18:59 - mmengine - INFO - ncnn:        None
07/06 02:18:59 - mmengine - INFO - snpe:        None
07/06 02:18:59 - mmengine - INFO - openvino:    None
07/06 02:18:59 - mmengine - INFO - torchscript: 1.13.0
07/06 02:18:59 - mmengine - INFO - torchscript custom ops:      NotAvailable
07/06 02:18:59 - mmengine - INFO - rknn-toolkit:        None
07/06 02:18:59 - mmengine - INFO - rknn-toolkit2:       None
07/06 02:18:59 - mmengine - INFO - ascend:      None
07/06 02:18:59 - mmengine - INFO - coreml:      None
07/06 02:18:59 - mmengine - INFO - tvm: None
07/06 02:18:59 - mmengine - INFO - vacc:        None
07/06 02:19:00 - mmengine - INFO - 

07/06 02:19:00 - mmengine - INFO - **********Codebase information**********
07/06 02:19:00 - mmengine - INFO - mmdet:       3.1.0
07/06 02:19:00 - mmengine - INFO - mmseg:       None
07/06 02:19:00 - mmengine - INFO - mmpretrain:  None
07/06 02:19:00 - mmengine - INFO - mmocr:       None
07/06 02:19:00 - mmengine - INFO - mmagic:      None
07/06 02:19:00 - mmengine - INFO - mmdet3d:     None
07/06 02:19:00 - mmengine - INFO - mmpose:      None
07/06 02:19:00 - mmengine - INFO - mmrotate:    None
07/06 02:19:00 - mmengine - INFO - mmaction:    None
07/06 02:19:00 - mmengine - INFO - mmrazor:     None
07/06 02:19:00 - mmengine - INFO - mmyolo:      None

Error traceback

07/06 02:12:51 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
07/06 02:12:51 - mmengine - INFO - Export PyTorch model to ONNX: /workdir/mmdeploy-workdir/dyhead_swin_1024.onnx.
07/06 02:12:51 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/single_stage.py:84: TracerWarning: Iterating over a tensor might cause the trace 
to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
...                                                                                                   
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it
 may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/conda/co
nda-bld/pytorch_1666642991888/work/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.)                                                      
  _C._jit_pass_onnx_graph_shape_type_inference(                                                                                                    
07/06 02:13:37 - mmengine - INFO - Execute onnx optimize passes.                                                                                   
07/06 02:13:46 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx                                                           
07/06 02:13:48 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in subprocess                                               
07/06 02:13:48 - mmengine - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so     
[07/06/2023-02:13:48] [TRT] [I] [MemUsageChange] Init CUDA: CPU +457, GPU +0, now: CPU 548, GPU 511 (MiB)                                          
[07/06/2023-02:13:48] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 548 MiB, GPU 511 MiB                             
[07/06/2023-02:13:48] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 702 MiB, GPU 555 MiB                               
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/proto
buf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will 
be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobu
f/io/coded_stream.h.                                                                                                                               
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/proto
buf/io/coded_stream.cc:81] The total number of bytes read was 909753679                                                                            
[07/06/2023-02:13:49] [TRT] [I] ----------------------------------------------------------------                                                   
[07/06/2023-02:13:49] [TRT] [I] Input filename:   /workdir/mmdeploy-workdir/dyhead_swin_1024.onnx                                                  
[07/06/2023-02:13:49] [TRT] [I] ONNX IR version:  0.0.6                                                                                            
[07/06/2023-02:13:49] [TRT] [I] Opset version:    11                                                                                               
[07/06/2023-02:13:49] [TRT] [I] Producer name:    pytorch                                                                                          
[07/06/2023-02:13:49] [TRT] [I] Producer version: 1.13.0                                                                                           
[07/06/2023-02:13:49] [TRT] [I] Domain:                                                                                                            
[07/06/2023-02:13:49] [TRT] [I] Model version:    0                                                                                                
[07/06/2023-02:13:49] [TRT] [I] Doc string:                                                                                                        
[07/06/2023-02:13:49] [TRT] [I] ----------------------------------------------------------------                                                   
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/proto
buf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will 
be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobu
f/io/coded_stream.h.                                                                                                                               
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 909753679                                                                    [0/1842]
[07/06/2023-02:13:49] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/06/2023-02:13:49] [TRT] [I] No importer registered for op: Xor. Attempting to import as plugin.
[07/06/2023-02:13:49] [TRT] [I] Searching for plugin: Xor, plugin_version: 1, plugin_namespace: 
[07/06/2023-02:13:49] [TRT] [E] parsers/onnx/ModelImporter.cpp:780: While parsing node number 41 [Xor -> "/backbone/stages.0/blocks.0/attn/Xor_output_0"]:
[07/06/2023-02:13:49] [TRT] [E] parsers/onnx/ModelImporter.cpp:781: --- Begin node ---
[07/06/2023-02:13:49] [TRT] [E] parsers/onnx/ModelImporter.cpp:782: input: "/backbone/stages.0/blocks.0/attn/Constant_5_output_0"
input: "/backbone/stages.0/blocks.0/attn/Constant_5_output_0"
output: "/backbone/stages.0/blocks.0/attn/Xor_output_0"
name: "/backbone/stages.0/blocks.0/attn/Xor"
op_type: "Xor"

[07/06/2023-02:13:49] [TRT] [E] parsers/onnx/ModelImporter.cpp:783: --- End node ---
[07/06/2023-02:13:49] [TRT] [E] parsers/onnx/ModelImporter.cpp:785: ERROR: parsers/onnx/builtin_op_importers.cpp:4870 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
Process Process-3:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/utils/utils.py", line 98, in to_backend
    return backend_mgr.to_backend(
  File "/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/backend_manager.py", line 127, in to_backend
    onnx2tensorrt(
  File "/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 79, in onnx2tensorrt
    from_onnx(
  File "/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 185, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 41 (importFallbackPluginImporter): UNSUPPORTED_NODE: Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

07/06 02:13:49 - mmengine - ERROR - /root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.utils.utils.to_backend` with Call id: 1 failed. exit.

open-mmlab / mmdeploy