Detect a known object - Githubissues

nickswalker commented 4 years ago

Input: camera image Output: bbox detection, or sufficient information such that object centroid can be estimated

For our pick and place milestone, it doesn't matter what object we can detect (preferably it's a YCB object in the set of RoboCup items). The goal is to have a working detection pipeline that we can evaluate end-to-end with manipulation.

JHLee0513 commented 4 years ago

Update (Tue 5 2020):

Model runs relatively fast on jetson off the shelf (~20FPS, visually slow with -X on remote)
Objects classified as general household items e.g. book, bottle rather than the YCB label
Detection relatively unstable in tested experiment, some of the known objects are detected but are also often dependent on the overall pose of the object when seen by camera.

nickswalker commented 4 years ago

From @JHLee0513's shared results:

Potential future work: Finetune on YCB video dataset (256G???) Direct regression of pose (PoseCNN, DeepIM → seems to work well, issue being different framework) Investigation of segmentation networks (Take fast models trained on cityscape, etc, finetune on YCB video data as well) Keypoint matching since all objects are known?? (Or similar traditional CV approach)

Screenshot` from 2020-02-06 12-17-24

This is better than I expected given that these weren't explicitly trained for. The bounding boxes do look funky though, like something is wrong with NMS or they're being drawn with a dimension skewed.

YCB video data looks promising. If it's easy to work with, go for it. Otherwise, let's work on getting a labeling pipeline and work with fewer images but from the robot's camera, in the lighting situation we care about, etc. It's okay to overfit as long as we can overfit with a fast turn around time on site.
If it's possible, we want a single pipeline for both standard (we know about them today) and "known" (we know about them during set up days) objects. It's not clear that something like PoseCNN could be reasonably fine-tuned to function for a new object class on a short timescale (even with more time, it seems like it'd be crazy expensive). The hardware requirements are also a barrier.
Would be interested to see how segmentation performs out of the box
"Keypoint matching won't work as well as fine tuning YOLO" seems to be the consensus amongst robocup teams.

JHLee0513 commented 4 years ago

@nickswalker Do we have any storage/GPU solution for the YCB videos dataset to be handled? I could try using one of RSE lab machines, though I'd have to confirm its availability. (since the dataset is 265G...)

nickswalker commented 4 years ago

I think the update from @csemecu is that she'll check if we can use one of the VR capstone's machines as a short term solution. We should discuss more during Monday's meeting

JHLee0513 commented 4 years ago

@nickswalker As a followup to categories, should we include both categories from COCO and YCB for the final perception system? For quickly testing out the whole pipeline I will finetune only on YCB for now as to allow inspection based on objects we have.

nickswalker commented 4 years ago

Most of the COCO classes are irrelevant for us, so no need to include them.

JHLee0513 commented 4 years ago

Update (Fed 18): got model to start training, progress was delayed due to midterm :/ I will keep updating on its training speed, if trained the inference, etc ASAP

nickswalker commented 4 years ago

Based on what @JHLee0513 has shown, we seem to be well above this bar now. Future work is in making sure we can quickly train in additional classes (labeling pipeline #7) and in connecting 2D and 3D perception (like what's happening for pick and place, and eventually for receptionist #13).

nickswalker commented 4 years ago

Ah, but there's no code tracked for this anywhere. @JHLee0513 open a branch please.

JHLee0513 commented 4 years ago

branch opened here the code is currently under heavy modification (and not too familiar to integrating another repo inside as sub-module FYI)

nickswalker commented 4 years ago

Let's discuss how handle packaging tomorrow

nickswalker commented 4 years ago

We've put the detection python blob as a git submodule and set up a catkin package around it. The code isn't really in a usable state yet because it's unclear how to get any data out over ROS; the model is built up in pytorch, and requires Python 3. rospy is python2 only though, so we can't just open up publishers

JHLee0513 commented 4 years ago

@nickswalker rospy in melodic seems to support python3 (not tested personally, though there are many straightforward blog/tutorials about it online), would it be possible to set up a publisher as normal if such is the case?

nickswalker commented 4 years ago

Yes, as long as rospy is working should be good. Let's test that as soon as we can. We should also check that roslaunch and rosrun respect python3 shebangs and run the code as expected

homeskies / uw_common

Detect a known object #5