IBVS approach with CNN for UR robotic manipulation

MohamedAssanhaji commented 4 months ago

I've been using VISP for traditional IBVS with 2D cameras and various robots for pick and place tasks in challenging environments for a couple of years now. Following the trend in many papers, such as "An Image-Based Visual Servo Approach with Deep Learning for Robotic Manipulation" by Jingshu Liu and Yuan Li, I'm planning to integrate a CNN into my visual servoing algorithm. Currently, I'm working with different UR robots and an Intel RealSense depth camera 435 D (both compatible with VISP).

I have a couple of questions:

1) Can we integrate a pre-trained CNN into VISP algorithms? If yes, which specific algorithm among the many available?

2) Are there any existing or ongoing algorithms that utilize other DNN-based approaches for feature extraction in visual servoing?

SamFlt commented 4 months ago

Hi,

Concerning the 1st point:

Some detection NNs can be used in ViSP, see this tutorial This includes some versions of Yolo, Mobilenet and FasterRCNN. You can train your own networks in Python following the original repo's instructions, export them to ONNX and use this ONNX version in ViSP. If you have a 3D model of your object and wish to train a detection network on cheap synthetic data, see this tool. You can also use it to retrieve the object pose, the depth and normal maps.
You can also perform pose estimation using Megapose, see this tutorial. This requires a 3D model of the object/scene (You can use NeRFs to obtain a model, using this process). It tends to work best with objects that fully fit into the camera view. If you have the object pose, you can perform Pose-Based VS. For an example, see [here] (https://visp-doc.inria.fr/doxygen/visp-daily/servoAfma6MegaposePBVS_8cpp-example.html)
Finally, we are working on a Python API for ViSP that should allow to use all the different ViSP modules in Python, with an interface to numpy arrays, matplotlib etc. If you are using the Python API, you could use any learning algorithm and library (pytorch, scikit-learn or other) as long as you convert between its data types and ViSP's. To install the bindings, see the tutorial and the Python specific documentation

Concerning the 2nd point, most works leverage pose estimation or keypoints extraction to perform VS as a downstream task. You can find more information in my thesis's state of the art (shameless promotion :))

Sam

MohamedAssanhaji commented 4 months ago

Hey Sam,

Thanks for the comprehensive breakdown! It's awesome to see all these possibilities laid out. I'll definitely dive into those tutorials and resources you mentioned (like today !). Your work seems like a goldmine for me. And hey, no shame in promoting your thesis if it's packed with valuable insights :) Looking forward to exploring more of your work (added you on likendin too XD )

Cheers!

Mohamed

lagadic / visp

IBVS approach with CNN for UR robotic manipulation #1401