ROS1 package for SOTA Computer Vision Models including SAM, Cutie, GroundingDINO, DEVA, VLPart and MaskDINO.
Tested : image of 480X640 30hz, 3090ti
sam_node publishes segmentation prompt which is used by cutie_node to track objects. It runs almost real-time (~30hz).
deva_node queries objects GroundingDINO and SAM at some intervals, so it can track new object after tracking is started. It runs ~15hz and you can adjust cfg['detection_every']
for performance.
See node_scripts/model_config.py
This package is build upon
If you want build this package directly on your workspace, please be aware of python environment dependencies (python3.9 and pytorch is needed to build package).
mkdir -p ~/ros/catkin_ws/src && cd ~/ros/catkin_ws/src
git clone https://github.com/ojh6404/deep_vision_ros.git
wstool init
wstool merge -t . deep_vision_ros/deep_vision_ros/rosinstall.noetic
wstool update -t . # jsk-ros-pkg/jsk_visualization for GUI
cd deep_vision_ros/deep_vision_ros && ./prepare.sh
cd ~/ros/catkin_ws && catkin b
You can build ROS package with anaconda environment.
sudo apt-get install libxml2-dev libxslt-dev libopenblas-dev libspatialindex-dev freeglut3-dev libsuitesparse-dev libblas-dev liblapack-dev libxcb-cursor0
conda create -n ros-env python=3.9 -y
conda activate ros-env
pip install psutil==5.5.1 empy==3.3.2 rospkg gnupg pycryptodomex catkin-tools wheel cython # for ROS build
mkdir -p ~/ros/catkin_ws/src && cd ~/ros/catkin_ws/src
git clone https://github.com/ojh6404/deep_vision_ros.git
wstool init
wstool merge -t . deep_vision_ros/deep_vision_ros/rosinstall.noetic
wstool update -t . # jsk-ros-pkg/jsk_visualization for GUI
cd deep_vision_ros/deep_vision_ros && ./prepare.sh
cd ~/ros/catkin_ws && catkin b
Otherwise, you can build only deep_vision_ros_utils
package for using intractive prompt gui
mkdir -p ~/ros/catkin_ws/src && cd ~/ros/catkin_ws/src
git clone https://github.com/ojh6404/deep_vision_ros.git
wstool init
wstool merge -t . deep_vision_ros/deep_vision_ros/rosinstall.noetic
wstool update -t . # jsk-ros-pkg/jsk_visualization for GUI
cd ~/ros/catkin_ws && catkin b deep_vision_ros_utils
and build whole package on docker environment.
source ~/ros/catkin_ws/devel/setup.bash
roscd deep_vision_ros_utils/../deep_vision_ros
docker build -t deep_vision_ros .
Please refer sample_track.launch and deva.launch
roslaunch deep_vision_ros sample_track.launch \
input_image:=/kinect_head/rgb/image_rect_color \
mode:=prompt \
model_type:=vit_t \
device:=cuda:0
You need to launch tracker and gui seperately cause docker doesn't have gui, so launch tracker by
./run_docker -cache -host pr1040 -launch track.launch \
input_image:=/kinect_head/rgb/image_rect_color \
mode:=prompt \
model_type:=vit_t \
device:=cuda:0
where
-host
: hostname like pr1040
or localhost
-launch
: launch file name to run-cache
: caching downloaded checkpointsand launch rqt gui on your gui machine by
roslaunch deep_vision_ros_utils sam_gui.launch
roslaunch deep_vision_ros deva.launch \
input_image:=/kinect_head/rgb/image_rect_color \
model_type:=vit_t \
device:=cuda:0
or
./run_docker -cache -host pr1040 -launch deva.launch \
input_image:=/kinect_head/rgb/image_rect_color \
dino_model_type=swinb \
sam_model_type:=vit_t \
device:=cuda:0
and use dynamic reconfigure to set detection and object tracking by
rosrun dynamic_reconfigure dynparam set /deva_node classes "cloth; cup; bottle;"
roslaunch deep_vision_ros vlpart_segment.launch \
input_image:=/kinect_head/rgb/image_rect_color \
vocabulary:=custom \
classes:="cup handle; bottle cap;" \
device:=cuda:0
or
./run_docker -cache -host pr1040 -launch vlpart_segment.launch \
input_image:=/kinect_head/rgb/image_rect_color \
vocabulary:=custom \
classes:="cup handle; bottle cap;" \
device:=cuda:0