hi, I found you have got 4.86ms TRT-FP16-Latency in DOTA dataset with RTMDet-s, and you write "The inference speed here is measured on an NVIDIA 2080Ti GPU with TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1, and with NMS" in README, I have tested this for same 2080Ti with TensorRT 8.2.3, cuDNN 8, FP16, batch size=1, and with NMS but I only get 24ms Latency.
The following is my test_config and convert TRT-FP16 steps:
`
I also following demo/img_demo.py to inference, and don't add time for reading images, so i don't know why my latency is so slow, can you give me some advice or release your codes for testing TRT-FP16-Latency
Prerequisite
💬 Describe the reimplementation questions
hi, I found you have got 4.86ms TRT-FP16-Latency in DOTA dataset with RTMDet-s, and you write "The inference speed here is measured on an NVIDIA 2080Ti GPU with TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1, and with NMS" in README, I have tested this for same 2080Ti with TensorRT 8.2.3, cuDNN 8, FP16, batch size=1, and with NMS but I only get 24ms Latency.
The following is my test_config and convert TRT-FP16 steps:
test_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict(scale=( 1024, 1024, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 1024, 1024, ), type='LetterResize'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', ), type='mmdet.PackDetInputs')]
` simplify = True fp16 = True register_all_modules() backend = MMYOLOBackend('tensorrt8') postprocess_cfg = ConfigDict( pre_top_k=1000, keep_top_k=100, iou_threshold=0.65, score_threshold=0.1)
output_names = ['num_dets', 'boxes', 'scores', 'labels'] baseModel = build_model_from_cfg(config_path, model_path, device)
`
The TRT-convert steps is following projects/easydeploy/export_onnx, and This is my test codes
` img_h, img_w = input_image.shape[0], input_image.shape[1]
get model class name
` I also following demo/img_demo.py to inference, and don't add time for reading images, so i don't know why my latency is so slow, can you give me some advice or release your codes for testing TRT-FP16-Latency
Environment
CUDA 11.1 Cudnn 8 TRT 8.2.3
Expected results
No response
Additional information
No response