open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.02k stars 9.36k forks source link

Inference is extremely slow on Jetson Xavier NX #8198

Closed YanisLax closed 1 year ago

YanisLax commented 2 years ago

Hi everyone, I'm currently trying to run a very basic code on my Jetson Xavier NX in order to do object detection on a video, with MMDetection. But it seems that whatever the model I test, it takes an average of 1 second to infer a single frame (0.7s for the best one I checked), which is extremely slow and under the expected inference time advertised on the mmdet website (~50 fps).

I also tested the mmdetection demo scipts (video_demo.py and video_gpuacc.py), tried to convert my mmdet model to a TensorRT model (fp16 and int8 tested), but I still have approximatively the same results.

I really don't know what I'm missing ...

Please note that I previously worked on YoloV3 with Darknet and I had no problem like this. My code can be seen below.

Environnement

Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
CUDA available: True
GPU 0: Xavier
CUDA_HOME: /usr/local/cuda-11.4
NVCC: Cuda compilation tools, release 11.4, V11.4.166
GCC: aarch64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 11.4
  - NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87
  - CuDNN 8.3.2
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.3.2, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.11.1
OpenCV: 4.5.5
MMCV: 1.5.2
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.4
MMDetection: 2.25.0+ca11860

My code

from mmdet.apis import init_detector, inference_detector
import mmcv
import cv2

config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# wrap_fp16_model(model)

def main():
    video_reader = mmcv.VideoReader("/home/thalesgroup/Thales/medias/video.mp4")

    for frame in mmcv.track_iter_progress(video_reader):
        result = inference_detector(model,frame)
        frame = model.show_result(frame, result)
        cv2.namedWindow('Processed video', 0)
        mmcv.imshow(frame, 'Processed video', 1)

if __name__ == '__main__':
    main()

Any help or idea is welcomed, thanks !

jbwang1997 commented 2 years ago

Hello @YanisLax. Does the inference time only include the time of inference_detector or both codes in one iteration of frame?

YanisLax commented 2 years ago

Hello, thanks for your help. After printing the time between instructions, I found the following values : inference_detector() time : 0.7s show_result() time : 0.47s Do you have an idea of the factors which could cause this huge latency ?

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 year ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.