open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.75k stars 631 forks source link

[Bug] 用jetson推理trt模型显存占用高 #2415

Closed 564192234 closed 1 year ago

564192234 commented 1 year ago

Checklist

Describe the bug

在用官方提供的mmdet2d手部检测,ssdlite_mobilenetv2_scratch_600e_onehand-4f9f8686_20220523.pth,ssdlite_mobilenetv2_scratch_600e_onehand.py,在jetson把模型用detection_tensorrt_static-300x300.py转为了trt格式,文件end2end.enigne只有13m左右。我想用摄像头进行实时检测在推理时先是采用了使用 Model Converter 的推理 api内的inference_model更改后进行推理,启动以后发现显存占了差不多1.2个g。后更换为使用推理 SDK更改后进行推理,显存占用差不多1g,目前的推理速度能跟上摄像头的帧率30帧,准确率也很高。感觉15m不到的模型,cpu和显存占用有些高了,是我哪个地方出现问题了,请问有什么好的方法能够减少显存和cpu的占用。

Reproduction

/////////inference_model更改后,运行以后差不多占用1-2g显存 import cv2 import mmcv import numpy as np import torch from typing import Any, Sequence, Union from mmdeploy.utils import get_input_shape, load_config from mmdeploy.apis.utils import build_task_processor

cap = cv2.VideoCapture(0) cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

deploy_cfg = "mmdeploy/configs/mmdet/detection/detection_tensorrt_static-300x300.py" model_cfg = "mmdet_hand/ssdlite_mobilenetv2_scratch_600e_onehand.py" backend_files = ["mmdet_hand/trt_hand/end2end.engine"]

deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg) task_processor = build_task_processor(model_cfg, deploy_cfg, "cuda:0") model = task_processor.init_backend_model(backend_files) input_shape = get_input_shape(deploy_cfg)

设置显示窗口

cv2.namedWindow("video") cv2.resizeWindow("video", 640, 480) while True:

读取摄像头的图像

_, frame = cap.read()
model_inputs, _ = task_processor.create_input(frame, input_shape)
with torch.no_grad():
    hands_result = task_processor.run_inference(model, model_inputs)[0]

# 等待按下 'q' 键退出
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

cv2.destroyAllWindows()

///////////////////////推理 SDK更改后进行推理,运行以后差不多占用1g显存 import cv2 import time import numpy as np from mmdeploy_python import Detector

设置显示窗口

cap = cv2.VideoCapture(0) cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

detector = Detector("trt_hand", "cuda", 0)

while True:

读取摄像头的图像

_, frame = cap.read()

hands_result, _, _ = detector([frame])[0]

if cv2.waitKey(1) & 0xFF == ord('q'):
    break

cv2.destroyAllWindows()

Environment

2023-09-08 15:45:31,383 - mmdeploy - INFO - 

2023-09-08 15:45:31,383 - mmdeploy - INFO - **********Environmental information**********
2023-09-08 15:45:39,912 - mmdeploy - INFO - sys.platform: linux
2023-09-08 15:45:39,913 - mmdeploy - INFO - Python: 3.6.7 | packaged by conda-forge | (default, Feb 24 2019, 02:17:42) [GCC 7.3.0]
2023-09-08 15:45:39,914 - mmdeploy - INFO - CUDA available: True
2023-09-08 15:45:39,914 - mmdeploy - INFO - GPU 0: Xavier
2023-09-08 15:45:39,914 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-09-08 15:45:39,915 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.89
2023-09-08 15:45:39,915 - mmdeploy - INFO - GCC: gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
2023-09-08 15:45:39,915 - mmdeploy - INFO - PyTorch: 1.10.0
2023-09-08 15:45:39,915 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.0
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=8.0.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, 

2023-09-08 15:45:39,916 - mmdeploy - INFO - TorchVision: 0.11.1
2023-09-08 15:45:39,916 - mmdeploy - INFO - OpenCV: 4.8.0
2023-09-08 15:45:39,916 - mmdeploy - INFO - MMCV: 1.4.0
2023-09-08 15:45:39,916 - mmdeploy - INFO - MMCV Compiler: GCC 7.5
2023-09-08 15:45:39,916 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2
2023-09-08 15:45:39,917 - mmdeploy - INFO - MMDeploy: 0.5.0+4335313
2023-09-08 15:45:39,917 - mmdeploy - INFO - 

2023-09-08 15:45:39,917 - mmdeploy - INFO - **********Backend information**********
2023-09-08 15:45:43,430 - mmdeploy - INFO - onnxruntime: None   ops_is_avaliable : False
2023-09-08 15:45:43,601 - mmdeploy - INFO - tensorrt: 7.1.3.0   ops_is_avaliable : True
2023-09-08 15:45:43,706 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2023-09-08 15:45:43,712 - mmdeploy - INFO - pplnn_is_avaliable: False
2023-09-08 15:45:43,717 - mmdeploy - INFO - openvino_is_avaliable: False
2023-09-08 15:45:43,718 - mmdeploy - INFO - 

2023-09-08 15:45:43,718 - mmdeploy - INFO - **********Codebase information**********
2023-09-08 15:45:43,725 - mmdeploy - INFO - mmdet:  2.19.0
2023-09-08 15:45:43,725 - mmdeploy - INFO - mmseg:  None
2023-09-08 15:45:43,726 - mmdeploy - INFO - mmcls:  None
2023-09-08 15:45:43,726 - mmdeploy - INFO - mmocr:  None
2023-09-08 15:45:43,726 - mmdeploy - INFO - mmedit: None
2023-09-08 15:45:43,727 - mmdeploy - INFO - mmdet3d:    None
2023-09-08 15:45:43,727 - mmdeploy - INFO - mmpose: None
2023-09-08 15:45:43,727 - mmdeploy - INFO - mmrotate:   None

Error traceback

No response

irexyc commented 1 year ago

jetson 内存显存一起算的吧。

不加opencv的显示,只做推理,占用是什么情况呢?

import cv2
import time
import numpy as np
from mmdeploy_python import Detector

detector = Detector("/path/to/model", "cuda", 0)
img = cv.imread("/path/to/img")

while True:
  result = detector(img)
564192234 commented 1 year ago

import cv2 import time import numpy as np from mmdeploy_python import Detector

detector = Detector("/path/to/model", "cuda", 0) img = cv.imread("/path/to/img")

while True: result = detector(img) 应该是我看错了,前面的代码我也是把cv2的显示都注释了跑的代码,运行您发的代码,cpu加了1.20g,gpu加了0.45g,和原来占用的差不多end2end.engine只有13m,是正常的吗,有办法改善减少占用吗,。

irexyc commented 1 year ago

jetson 设备上不是统一内存么,你怎么看的cpu和gpu各加了多少?

你可以用trtexec来测一下模型,看看测试的时候加了多少,如果差不多,那估计就这样了。

564192234 commented 1 year ago

我是用jtop看的 IMG20230908174543(1) IMG20230908174559(1) 这样算是正常的话,我就暂时先这样咯 对了,我还想问问用使用 Model Converter ,cpu加了3g、gpu加了1g,比sdk推理几乎翻倍了,这也是正常的吗。所以使用要最省资源需要用sdk是推理,是这样吗

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

HEIseYOUmolc commented 9 months ago

from mmdeploy_python import Detector @irexyc @564192234 想问下我按照官方教程给的安装的mmdeploy 1.3版本,没有安装这个mmdeploy_runtime的包,我该通过什么途径来安装这个包,我的设备是jetson nano 4.6.1版本,请问你是怎么安装的这个包在jetsonanno设备上

AZong76 commented 7 months ago

from mmdeploy_python import Detector @irexyc @564192234 想问下我按照官方教程给的安装的mmdeploy 1.3版本,没有安装这个mmdeploy_runtime的包,我该通过什么途径来安装这个包,我的设备是jetson nano 4.6.1版本,请问你是怎么安装的这个包在jetsonanno设备上

您好 请问解决了吗 ? 我最近打算在jetson上实时推理分割任务,发现找不到安装mmdeploy_runtime的地方,如果您有解决方案,希望得到您的回复。