YOLO v4 + ROS2 Humble (Foxy) + CUDA11 + cuDNN (FP16)

Ar-Ray-code commented 3 years ago

Hello. I am a college student in Japan and a fan of darknet_ros.

I've been wanting to make the ROS2 + YOLO v4 implementation happen for a long time, and I'm happy to report that I was able to implement it.

Main changes (my commits -> foxy)

Support for YOLO v4 : Switched the submodule to the master branch of AlexeyAB/darknet.
Removed IPL : Switched from IPL to CV::Mat for OpenCV4 support.
cuDNN:fire::fire:: supported cuDNN & FP16

Requirements

ROS2 Foxy
OpenCV4 ($ sudo apt install ros-foxy-vision-opencv)
CUDA 10 or 11 (tested with CUDA 11.3)
cuDNN 8 (Optional)

Installation

$ source /opt/ros/foxy/setup.bash
$ mkdir -p ~/ros2_ws/src
$ cd ~/ros2_ws/src
$ git clone --recursive  https://github.com/Ar-Ray-code/darknet_ros_yolov4.git
$ darknet_ros_yolov4/darknet_ros/rm_darknet_CMakeLists.sh
$ cd ~/ros2_ws
$ colcon build --symlink-install

Demo

Connect your webcam to your PC.

Terminal

$ source /opt/ros/foxy/setup.bash
$ source ~/ros2_ws/install/local_setup.bash
$ ros2 launch darknet_ros demo-v4-tiny.launch.py

example

Performance

Using YOLO v4 consumes a lot of GPU memory and lowers the frame rate, so you need to pay attention to your PC specs.

Test Machine

Topics	Spec
CPU	Ryzen7 2700X (@3.7GHz x 16)
RAM	16GB DDR4
GPU	NVIDIA GeForce RTX 2080 Ti (GDDR6 11GB)
Driver	460.32.03

Performance

YOLO v3 : 67 fps (72 ~ 62 fps), uses 1781MB of VRAM YOLO v4 : 29 fps (27 ~ 30.5 fps), uses 3963MB of VRAM

Please give it a try. Thank you.

tomlankhorst commented 3 years ago

Exciting work, thank you. We'll try to evaluate your work and come back to this asap.

Ar-Ray-code commented 3 years ago

By supporting cuDNN (FP16), I have succeeded in increasing the speed by 1.3 times. Please see the following report. Also, CPU-only inference is not supported at this stage.

This repository explains it.

English -> https://github.com/Ar-Ray-code/darknet_ros_fp16/wiki/Darknet_ros_FP16-Report-(1.3x-faster)-%F0%9F%94%A5 日本語→ https://zenn.dev/array/articles/4c82fc8382e62d

tomlankhorst commented 2 years ago

Dear @Ar-Ray-code, first of all, sorry for the slow response. @mbjelonic and I like your contributions. Right now, we're not using Darknet for ROS 2 on our real robots (we're using Noetic, since real-robot development is a bit slower than pure software development). So for the time being I propose that the foxy branch will be more of a 'community' branch, instead of a leggedrobotics' supported branch. On this branch, we can be more flexible and quicker in merging PRs.

Now I only have to see how I can resolve any conflicts between this PR and #337 . If you have a suggestion, feel free to let me know.

mbjelonic commented 2 years ago

@tomlankhorst and @Ar-Ray-code maybe we can merge this branch and give @Ar-Ray-code permissions to check PRs to the foxy branch?

Ar-Ray-code commented 2 years ago

Changed CMakeLists.txt to work correctly on CPU. OpenMP is used. https://github.com/leggedrobotics/darknet_ros/pull/319/commits/706dce051f4dacc345dd3ebf1166df34d35e05c6

Ar-Ray-code commented 2 years ago

Did you devel on top of the master or the foxy branch, @Ar-Ray-code? Could you rebase such that only your changes are included?

I develop this on the master branch.

Ar-Ray-code commented 2 years ago

Did the build for the GPU work? If you have any questions, please let me know :)

Ar-Ray-code commented 2 years ago

I will support ROS-Humble and Ampere architecture. Are there any plans to create a Humble branch?

wanilly commented 1 year ago

Hello, bro! Your code and advising help us. Thank you so much. I have a error. Error: CMake Error at /usr/share/cmake-3.16/Modules/FindCUDA.cmake:707 (message): Specify CUDA_TOOLKIT_ROOT_DIR

So, it is not building colcon. How can I solve problem. Help me!!

--> Feb 9 17:18 in korea : I guess CUDA version problem. I have another issuse.

leggedrobotics / darknet_ros