Open nickswalker opened 4 years ago
Update (Tue 5 2020):
From @JHLee0513's shared results:
Potential future work: Finetune on YCB video dataset (256G???) Direct regression of pose (PoseCNN, DeepIM → seems to work well, issue being different framework) Investigation of segmentation networks (Take fast models trained on cityscape, etc, finetune on YCB video data as well) Keypoint matching since all objects are known?? (Or similar traditional CV approach)
This is better than I expected given that these weren't explicitly trained for. The bounding boxes do look funky though, like something is wrong with NMS or they're being drawn with a dimension skewed.
YCB video data looks promising. If it's easy to work with, go for it. Otherwise, let's work on getting a labeling pipeline and work with fewer images but from the robot's camera, in the lighting situation we care about, etc. It's okay to overfit as long as we can overfit with a fast turn around time on site.
If it's possible, we want a single pipeline for both standard (we know about them today) and "known" (we know about them during set up days) objects. It's not clear that something like PoseCNN could be reasonably fine-tuned to function for a new object class on a short timescale (even with more time, it seems like it'd be crazy expensive). The hardware requirements are also a barrier.
Would be interested to see how segmentation performs out of the box
"Keypoint matching won't work as well as fine tuning YOLO" seems to be the consensus amongst robocup teams.
@nickswalker Do we have any storage/GPU solution for the YCB videos dataset to be handled? I could try using one of RSE lab machines, though I'd have to confirm its availability. (since the dataset is 265G...)
I think the update from @csemecu is that she'll check if we can use one of the VR capstone's machines as a short term solution. We should discuss more during Monday's meeting
@nickswalker As a followup to categories, should we include both categories from COCO and YCB for the final perception system? For quickly testing out the whole pipeline I will finetune only on YCB for now as to allow inspection based on objects we have.
Most of the COCO classes are irrelevant for us, so no need to include them.
Update (Fed 18): got model to start training, progress was delayed due to midterm :/ I will keep updating on its training speed, if trained the inference, etc ASAP
Based on what @JHLee0513 has shown, we seem to be well above this bar now. Future work is in making sure we can quickly train in additional classes (labeling pipeline #7) and in connecting 2D and 3D perception (like what's happening for pick and place, and eventually for receptionist #13).
Ah, but there's no code tracked for this anywhere. @JHLee0513 open a branch please.
branch opened here the code is currently under heavy modification (and not too familiar to integrating another repo inside as sub-module FYI)
Let's discuss how handle packaging tomorrow
We've put the detection python blob as a git submodule and set up a catkin package around it. The code isn't really in a usable state yet because it's unclear how to get any data out over ROS; the model is built up in pytorch, and requires Python 3. rospy is python2 only though, so we can't just open up publishers
@nickswalker rospy in melodic seems to support python3 (not tested personally, though there are many straightforward blog/tutorials about it online), would it be possible to set up a publisher as normal if such is the case?
Yes, as long as rospy is working should be good. Let's test that as soon as we can. We should also check that roslaunch
and rosrun
respect python3 shebangs and run the code as expected
Input: camera image Output: bbox detection, or sufficient information such that object centroid can be estimated
For our pick and place milestone, it doesn't matter what object we can detect (preferably it's a YCB object in the set of RoboCup items). The goal is to have a working detection pipeline that we can evaluate end-to-end with manipulation.