open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.78k stars 638 forks source link

[Bug] file to convert registry models into onnx #2217

Closed gmk11 closed 1 year ago

gmk11 commented 1 year ago

Checklist

Describe the bug

Hi , i wrote a model(PCT) based on mmpose using registry , i trained it and i am able to do the pytorch inference . I want convert it into Onnx but i got this error : 'PCT is not in the models registry' somebody can help me ? i used my initial model config who worked for inference , training and testing in pytorch

Reproduction

I ran this script : python tools/deploy.py \configs/mmpose/pose-detection_onnxruntime_static.py my_config_file.py pct_weight.pth demo/resources/human-pose.jpg --work-dir mmdeploy_models/mmpose/ort --device cpu --show

where the config file is

base = ['./coco.py']

log_level = 'INFO' load_from = None resume_from = None

dist_params = dict(backend='nccl')

workflow = [('train', 1)]# [('train', 1), ('val', 1)] the runner will calculate va find_unused_parameters=False checkpoint_config = dict(interval=1, create_symlink=False) evaluation = dict(interval=1, metric='mAP', save_best='AP')

optimizer_config = dict(grad_clip=None)

lr_config = dict( policy='CosineAnnealing', # test with cosine d warmup='linear', warmup_iters=500, warmup_ratio=0.001, min_lr_ratio=1e-5) total_epochs = 300 log_config = dict( interval=4682,# 4682 pour faire par epoch hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) visualizer = dict(vis_backends=[ dict(type='LocalVisBackend'), dict(type='TensorboardVisBackend'), ])

optimizer = dict(type='AdamW', lr=4.4e-4, betas=(0.9, 0.999), weight_decay=8e-6 ) channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], ], inference_channel=[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ])

data_cfg = dict( image_size=[128, 128], # Size of model input resolution heatmap_size=[32, 32], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=True, det_bbox_thr=0.0, bbox_file='/work1/gitlab-runner-docker-data/datasets/data_coco/data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json', )

model settings

model = dict( type='PCT', pretrained = '/work1/gitlab-runner-docker-data/models/PCT/my_mobileone/last_heatmap_128/best.pth',#'/work1/gitlab-runner-docker-data/models/PCT/mobileones4/final_heatmap/best.pth', backbone= dict(type='MobileOne', variant = 's0', stage = "classifier"), keypoint_head=dict( type= 'PCT_Head', stage_pct = 'classifier', in_channels= 1024, image_size=data_cfg['image_size'], num_joints=channel_cfg['num_output_channels'], loss_keypoint=dict( type='Classifer_loss', token_loss=1.0, joint_loss=1.0), cls_head=dict( conv_num_blocks=2, conv_channels=256, dilation=1, num_blocks=4, hidden_dim=64, token_inter_dim=64, hidden_inter_dim=256, dropout=0.0), tokenizer=dict( guide_ratio=0.0, ckpt= '/work1/gitlab-runner-docker-data/models/PCT/my_mobileone/no_extra_final_tokenizer/best_AP_epoch_12.pth', encoder=dict( drop_rate=0.2, num_blocks=4, hidden_dim=512, token_inter_dim=64, hidden_inter_dim=512, dropout=0.0, ), decoder=dict( num_blocks=1, hidden_dim=32, token_inter_dim=64, hidden_inter_dim=64, dropout=0.0, ), codebook=dict( token_num=34, token_dim=512, token_class_num=2048, ema_decay=0.9, ), loss_keypoint=dict( type='Tokenizer_loss', joint_loss_w=1.0, e_loss_w=15.0, beta=0.05,) )), test_cfg=dict( flip_test=True, dataset_name='COCO'))

ddh

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownGetBboxCenterScale', padding=1.25), dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict( type='Albumentation', transforms=[ dict( type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4, hue=0.2, p=1.0), dict( type='GridDropout', unit_size_min=10, unit_size_max=40, random_offset=True, p=0.5), ]), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img', 'joints_3d', 'joints_3d_visible'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownGetBboxCenterScale', padding=1.12), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

test_pipeline = val_pipeline

data_root = '/work1/gitlab-runner-docker-data/datasets/data_coco/data/coco/' data = dict( samples_per_gpu=32, workers_per_gpu=0, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_train2017.json', img_prefix=f'{data_root}/images/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/images/val2017/', data_cfg=data_cfg, pipeline=val_pipeline, dataset_info={{base.dataset_info}}), test=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/images/val2017/', data_cfg=data_cfg, pipeline=val_pipeline, dataset_info={{base.dataset_info}}) )

fp16 = dict(loss_scale='dynamic')

Environment

**ENVIRONMENT**
![env](https://github.com/open-mmlab/mmdeploy/assets/98350724/09be2f60-7d15-45c1-bd04-c01f43dfaf70)

here is my environment. these version of mmpose and mmcv are necessary for my model to work

Error traceback

/opt/conda/lib/python3.9/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
/opt/conda/lib/python3.9/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
/opt/conda/lib/python3.9/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
2023-06-27 12:16:07,697 - mmdeploy - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/mmdeploy/apis/pytorch2onnx.py", line 64, in torch2onnx
    torch_model = task_processor.init_pytorch_model(model_checkpoint)
  File "/opt/conda/lib/python3.9/site-packages/mmdeploy/codebase/mmpose/deploy/pose_detection.py", line 126, in init_pytorch_model
    model = init_pose_model(self.model_cfg, model_checkpoint, self.device)
  File "/opt/conda/lib/python3.9/site-packages/mmpose/apis/inference.py", line 43, in init_pose_model
    model = build_posenet(config.model)
  File "/opt/conda/lib/python3.9/site-packages/mmpose/models/builder.py", line 39, in build_posenet
    return POSENETS.build(cfg)
  File "/opt/conda/lib/python3.9/site-packages/mmcv/utils/registry.py", line 237, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/opt/conda/lib/python3.9/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/opt/conda/lib/python3.9/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
    raise KeyError(
KeyError: 'PCT is not in the models registry'
2023-06-27 12:16:09,069 - mmdeploy - ERROR - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.
RunningLeon commented 1 year ago

@gmk11 hi,

  1. make sure your PCT class is registered by @POSENETS.register_module()
  2. maybe you could import your PCT class and create the model manually in this file

    File "/opt/conda/lib/python3.9/site-packages/mmdeploy/codebase/mmpose/deploy/pose_detection.py", line 126, in init_pytorch_model model = init_pose_model(self.model_cfg, model_checkpoint, self.device)

gmk11 commented 1 year ago

thanks, fixed this previous error by importing my codes in the same directory that the code 'deploy.py' I ran. and it now know the registry model. But i got this error : "RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got ndarray"

But i tried to solve this by making sure my model return a dictionnary containing tensor but it doesn't worked, i checked online and the other issues and work on it since 3 days but i don't know how. could you help me ?

RunningLeon commented 1 year ago

@gmk11 hi, for custom model. you may need to support torch2onnx by yourself. Clearly, your ptc_head returns result not in type of torch.Tensor or List[torch.Tensor] or Tuple[toch.Tensor]. you may need to add rewriting functions in mmdeploy. You can refer to https://github.com/open-mmlab/mmdeploy/blob/6cd29e2152d6935bde2f9252b47170724bac20ac/mmdeploy/codebase/mmpose/models/heads/mspn_head.py#L10 and https://mmdeploy.readthedocs.io/en/latest/07-developer-guide/support_new_model.html

gmk11 commented 1 year ago

@RunningLeon hi, for custom model. you may need to support torch2onnx by yourself. Clearly, your ptc_head returns result not in type of torch.Tensor or List[torch.Tensor] or Tuple[toch.Tensor]. you may need to add rewriting functions in mmdeploy. You can refer to

https://github.com/open-mmlab/mmdeploy/blob/6cd29e2152d6935bde2f9252b47170724bac20ac/mmdeploy/codebase/mmpose/models/heads/mspn_head.py#L10 and https://mmdeploy.readthedocs.io/en/latest/07-developer-guide/support_new_model.html

I read it and try to follow the step but i don't understand. Where i have to write the REWRITER function? in my model implementation ? in an independant mmdeploy script ? ccause i tried to rewrite it in mmdeploy/mmdeploy/codebase/mmpose/models/head but i doesn't work . (i also import it in the init.py function )

gmk11 commented 1 year ago

@RunningLeon ????????????

RunningLeon commented 1 year ago

@gmk11 hi, pls. fork mmdeploy and commit your changes and push to a branch so we can see what you've changed.

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.