spacewalk01 / depth-anything-tensorrt

TensorRT implementation of Depth-Anything V1, V2
https://depth-anything.github.io/
MIT License
288 stars 33 forks source link
cpp depth-anything depth-camera depth-estimation depth-image image-depth-estimation monocular-depth-estimation python tensorrt video-depth
Depth Anything TensorRT CLI =========================== [![python](https://img.shields.io/badge/python-3.10.12-green)](https://www.python.org/downloads/release/python-31012/) [![cuda](https://img.shields.io/badge/cuda-11.8-green)](https://developer.nvidia.com/cuda-downloads) [![trt](https://img.shields.io/badge/TRT-10.0-green)](https://developer.nvidia.com/tensorrt) [![mit](https://img.shields.io/badge/license-MIT-blue)](https://github.com/spacewalk01/depth-anything-tensorrt/blob/main/LICENSE)

Depth estimation is the task of measuring the distance of each pixel relative to the camera. This repo provides a TensorRT implementation of the Depth-Anything depth estimation model in both C++ and Python, enabling efficient real-time inference.

Depth-Anything-V1

Depth-Anything-V2

News

⏱️ Performance

The inference time includes the pre-preprocessing and post-processing stages: Device Model Model Input (WxH) Image Resolution (WxH) Inference Time(ms)
RTX4090 Depth-Anything-S 518x518 1280x720 3
RTX4090 Depth-Anything-B 518x518 1280x720 6
RTX4090 Depth-Anything-L 518x518 1280x720 12

[!NOTE] Inference was conducted using FP16 precision, with a warm-up period of 10 frames. The reported time corresponds to the last inference.

🚀 Quick Start

C++

Example:

# infer image
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.jpg
# infer folder(images/videos)
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input data # folder containing videos/images
# infer video
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.mp4 # the video path
# specify output location
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.mp4 -output result # rendered depth maps will go into the "results" directory
# display progress in one line rather than multiple
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.mp4 -one-line
# modify prefix of generated files (default: "depth_")
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.mp4 -prefix "depthify_" # rendered depth map will have the name "depthify_test.mp4"
# show preview including before and after (may slow down performance)
depth-anything-tensorrt.exe -preview -model depth_anything_vitb14.engine -input test.mp4
# modify fps of footage (does not interpolate, will speed up or slow down footage if original video file has a different fps value)
depth-anything-tensorrt.exe -model depth_anything_vitb14.engine -input test.mp4 -fps 60
# use an existing engine file if found
depth-anything-tensorrt.exe -model depth_anything_vitb14.onnx -input test.mp4 -find-engine

Python

cd depth-anything-tensorrt/python

# infer image
python trt_infer.py --engine <path to trt engine> --img <single-img> --outdir <outdir> [--grayscale]

🛠️ Build

C++

Refer to our docs/INSTALL.md for C++ environment installation.

Python

cd <tensorrt installation path>/python
pip install cuda-python
pip install tensorrt-8.6.0-cp310-none-win_amd64.whl
pip install opencv-python

🤖 Model Preparation

Depth-Anything-V1

Perform the following steps to create an onnx model:

  1. Download the pretrained model and install Depth-Anything:

    git clone https://github.com/LiheYoung/Depth-Anything
    cd Depth-Anything
    pip install -r requirements.txt
  2. Copy dpt.py in depth_anything_v1 from this repo to <Depth-Anything>/depth_anything folder. And, Copy export_v1.py in depth_anything_v1 from this repo to <Depth-Anything> folder.

  3. Export the model to onnx format using export_v1.py. You will get an onnx file named depth_anything_vit{}14.onnx, such as depth_anything_vitb14.onnx. Note that I used torch cpu version for exporting the onnx model as it is not necessary to deploy the model on GPU when exporting.

    conda create -n depth-anything python=3.8
    conda activate depth-anything
    pip install torch torchvision
    pip install opencv-python
    pip install onnx
    cd Depth-Anything
    python export_v1.py --encoder vitb --load_from depth_anything_vitb14.pth --image_shape 3 518 518

Depth-Anything-V2

  1. Clone Depth-Anything-V2
    git clone https://github.com/DepthAnything/Depth-Anything-V2.git
    cd Depth-Anything-v2
    pip install -r requirements.txt
  2. Download the pretrained models from the readme and put them in checkpoints folder:
  3. Copy dpt.py in depth_anything_v2 from this repo to <Depth-Anything-V2>/depth_anything_v2 folder. And, Copy export_v2.py in depth_anything_v2 from this repo to <Depth-Anything-V2> folder.
  4. Run the following to export the model:
    conda create -n depth-anything python=3.8
    conda activate depth-anything
    pip install torch torchvision
    pip install opencv-python
    pip install onnx
    cd Depth-Anything-V2
    python export_v2.py --encoder vitb --input-size 518

[!TIP] The width and height of the model input should be divisible by 14, the patch height.

👏 Acknowledgement

This project is based on the following projects: