Closed ShrutheeshIR closed 3 years ago
Hi @ShrutheeshIR and thanks for your interest.
As explained in our paper^1, the feature extraction stage is performed using (see Section V, first paragraph)
a ResNet-50 Convolutional Neural Network model trained on the ImageNet Large-Scale Visual Recognition Challenge and available in the Caffe framework.
This network comes thus pre-trained but then we carry out a fine-tuning^2 on a particular dataset called "iCub World"^3 that is relevant to our domain.
You can find the instructions on how to retrieve all the required components plus the weights at https://github.com/robotology/himrep/tree/master/modules/caffeCoder.
Let me call for the main author of the feature extraction pipeline @GiuliaP, who may chime in just in case you need more insights.
cc @giuliavezzani
year = {2018}, title = {{Improving Superquadric Modeling and Grasping with Prior on Object Shapes}}, author = {Vezzani, Giulia and Pattacini, Ugo and Pasquale, Giulia and Natale, Lorenzo}, journal = {2018 IEEE International Conference on Robotics and Automation (ICRA)}, doi = {10.1109/icra.2018.8463161}, pages = {6875--6882}, keywords = {} }
author = {Giulia Pasquale and Carlo Ciliberto and Francesca Odone and Lorenzo Rosasco and Lorenzo Natale}, title = {Teaching iCub to recognize objects using deep Convolutional Neural Networks}, journal = {Proceedings of the 4th Workshop on Machine Learning for Interactive Systems, 32nd International Conference on Machine Learning}, year = {2015}, volume = {43}, pages = {21--25}, url = {http://www.jmlr.org/proceedings/papers/v43/pasquale15} }
Hello @ShrutheeshIR , thanks for the interest in this work.
The method used for classifying the object shapes, as @pattacini mentioned, is described in Sec. V of the paper and consists into a SVM classifier fed with features extracted from a ResNet-50 trained on ImageNet (@pattacini, we did not use a network fine-tuned on iCubWorld but an off-the-shelf network trained on ImageNet).
The classifier is trained on a set of images representing the objects depicted in Fig. 6 of the paper. The objects are taken in part from YCB and in part from iCubWorld datasets. The training images have been collected by putting each object on a table-top and extracting a surrounding bounding box via color-based segmentation (a few images for each object should suffice). This dataset is not released since, as explained in the paper, the training was carried out on-the-fly, with a human teacher providing the label (shape class) of each object after putting it on a table and pointing to it such that the robot localises it (you can have an idea of how the training works by watching this example video https://youtu.be/ghUFweqm7W8).
If you want to reproduce the ResNet-50 + SVM pipeline, you can hence either aim at reproducing the exact code or, considering that all modules are pretty common, reproduce it by using equivalent software modules. I provide some pointers for both approaches below.
Regarding instead the SVM training set, you could collect a small one by using custom objects. If instead you prefer to use the exact same objects we show in Fig. 6, you should search for datasets representing them (e.g., on a table-top). For YCB objects, if you find some dataset, we can give you feedback, while for iCubWorld objects (I think that are only three) we can help you to see if there are subsets of this dataset representing them on a table-top (you can start checking out the dataset website).
To reproduce the exact code of ResNet-50 + SVM, you need to install the caffe library and other dependencies, then install two YARP modules contained in the himrep repository, namely, the caffeCoder (implementing ResNet-50) and the linearClassifierModule (implementing the SVM). In the README of the caffeCoder module we provide some helper information describing the steps we took to install caffe and then the caffeCoder in our system at the time, while for the linearClassifierModule you can refer to its README.
NOTES:
To reproduce the pipeline instead with equivalent software modules, you could use a ResNet-50 trained on ImageNet and available in any deep learning framework, like PyTorch for instance. For the SVM, you could choose to use the one implemented in the YARP module linearClassifierModule or implement one on your own, maybe in Python. We can also support in this if you have specific questions regarding how to make the software equivalent in case something is not defined in the paper or the available code.
@pattacini and @GiuliaP gave very detailed answers, thank you both! I hope this clarifies your question @ShrutheeshIR 😃
Thank you for your detailed reply @giuliavezzani @GiuliaP and @pattacini
This work is great! I came across your latest work, that improves superquadric fitting using object priors. I was wondering if you could share the weights of the trained model for obtaining the object priors, that was trained on the YCB dataset, so I could test it out right away.
Thanks!