Research object recognition

reinzor commented 7 years ago

(issue edited by @LoyVanBeek) Our performance at RoboCup in japan was .... The feeling is we do OK when testing but not during actual challenges. Before we start making wild guesses, we need to check out current performance and be able to compare performance.

There are several tracks to work on:

Other implementations

[x] Study TDP and create an overview of the methods used by other teams :point_right: Overview
[x] ~(#508) Search the web for state of the art detection algorithms~
[x] What cloud based object recognition services do exist and how do they perform? Custom labels possible? :point_right: https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-341902436
[x] ~(#503) Try the detect net example from NVidia dusty page~
[x] Investigate if it's feasible to train your own detectnet wich such a small dataset, retraining? :point_right: https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-341906630
[x] (#505) A difficulty with EDs segmentation from depth is that it doesn't work on translucent objects. How can we improve that? There was a RoboCup team that could detect translucent objects some time ago.

Improvements on current implementation (i.e. inception_v3 retrained on RoboCup data)

[x] (#502) Automatically augment training data with noise, occlusions, warps, shears etc. There is a way to do this, but how to activate it? (Also see http://tflearn.org/data_augmentation/)
[x] ~Investigate deployment Jetsons using tensorrt.~ Conclusion: runs fine as it is.
[x] (#509)Measure influence of a margin around the ED-based segmentation. Conclusion: current margin is about the optimum.
[x] ~(#504)High effort: Refactor ED segmentation to separate ROS service. Allows to plug in different segmenters. RGB(D?) image in --> masked image out~
[ ] (#521) determine optimal parameters for training: batch size and steps

(Tools to) measure object recognition performance

[x] Create easy way to measure accuracy over a test_validation set on certain training data :point_right: https://github.com/tue-robotics/image_recognition/pull/30

Measure object recognition performance

[x] Get lots of images via "amigo, inspect the cabinet/bookcase/dinnertable" and put those in MEGA (#507)
[x] (#510) Compare the accuracy of a classifier when trained on N classes. Does the accuracy diminish with increasing N classes and how much? Conclusion: Reduce number of classes. So remove classes, which can't be detected. (Maybe also the classes that are always classified wrong/always have a too low probability.)
[x] (#511)investigate performance of current method (i.e. inception_v3 retrained on RoboCup data) Conclusion: ~60-65% for 25 objects, improvements with limited data is very difficult
[x] (#509)Annotate those with a large margin around the actual object, so we can test with varying margin down to smaller than the actual object
[x] Decide if classification is the problem or that better segmentation will help most (After #509). Conclusion: classification isn't as good as wanted, which would be 80%+

Different sensors

[x] (#501) RGBD Sensor selection (also investigate ROS drivers) New camera, but no plans for implementing

@alberth @JosjaG @LoyVanBeek @blumenkindC3

reinzor commented 7 years ago

Current implementation: https://github.com/tue-robotics/image_recognition

MatthijsBurgh commented 7 years ago

Might be useful, ROS wrapper of darknet: https://github.com/leggedrobotics/darknet_ros

reinzor commented 7 years ago

APC vision https://github.com/andyzeng/apc-vision-toolbox#deep-learning-fcn-ros-package

reinzor commented 7 years ago

Useful for selecting 3D sensor https://roscon.ros.org/2017/presentations/ROSCon%202017%203D%20Vision%20Technology.pdf

LoyVanBeek commented 6 years ago

From @alberth: object_recognition_research.txt

LoyVanBeek commented 6 years ago

Popular methods/systems for classification:

Classic feature descriptor like SIFT, SURF, BRIEF, BRISK, ORK (11x)
YOLO (7x)
GoogleNet (1x)
InceptionV3 (1x, TechUnited)
find_object_2d/3d ROS packages (1x)
Comparing H-S histograms (1x)
Google API + Watson API (1x
Compare color (HSV histogram), shape (Hu moments), size (bounding box)
HOG (1x)

Best team: Homer@UniKoblenz, using SURF with some additional smart (sounding) tricks.

LoyVanBeek commented 6 years ago

Weights of pre-trained nets: https://github.com/fchollet/keras/tree/master/keras/applications

LoyVanBeek commented 6 years ago

Another suspicion is that the recognition is spoiled by bad segmentation. So, how can segmentation be improved? Image Segmentation from scratch on RGBD-images directly is a waste, because we already do background subtraction in ED with the world model.

But the boundaries of the segmentation are not the exact borders of the objects and could thus be improved. There are several methods to do this:

Region growing
Conditional Random Fields
Semantic Segmentation

A name that keeps popping up is pGb.

We could use the segmentation from ED as a base for so-called 'weak supervision' or as a 'seed' to start the segmentation from.

For semantic segmentation, the competition to look at is based on the PASCAL VOC2012 dataset But for semantic segmentation, you need classes as well, so it might not be a good fit. However, just being able to separate without knowing which class is fine as well. The classification is done separately anyway.

alberth commented 6 years ago

I don't know how current ED is perceived, but if you ever want to get rid of the pre-baked world that Amigo needs (as I understood), relying on ED world model subtraction would become a PITA.

LoyVanBeek commented 6 years ago

Cloud based object recognition services:

Google Vision API (custom labels: :x: )
Microsoft Custom Vision (preview) (custom labels: :white_check_mark: )
Microsoft Cognitive toolkit (custom labels: :x: )
IBM Watson (custom labels: :white_check_mark: ). I'm not impressed by the demo when I upload an image of a single coke can on an empty table)
Clarifai.ai (custom labels: :white_check_mark: ). I'm not impressed by the demo when I upload an image of a single coke can on an empty table)
CloudSight (custom labels: :x: )
Imagga (custom labels: :white_check_mark: )

LoyVanBeek commented 6 years ago

Retraining DetectNet: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#model-creation

DetectNet is derived from GoogLeNet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.

reinzor commented 6 years ago

What were the conclusions of today\?

-Rein

On Sat, Nov 4, 2017 at 4:40 PM, Loy notifications@github.com wrote:

Retraining DetectNet: https://github.com/NVIDIA/ DIGITS/blob/master/examples/object-detection/README.md#model-creation

DetectNet is derived from GoogLeNet https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#detectnet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-341906630, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4l2VK0rEdCdfzU5Wip2cpsXmrRZzYks5szIVsgaJpZM4Pc0Cl .

LoyVanBeek commented 6 years ago

@reinzor Not many conclusions (well, that rqt segfaults on my laptop) but mostly following tutorials and general study into this topic.

People have been tweaking the threshold of putting a classification in the report of the Storing groceries challenge. A metric useful for tuning this could be the Precision-Recall curve. High precision relates to a low false positive rate, and high recall relates to a low false negative rate, which both depend on the threshold we put on classifications & detentions.

LoyVanBeek commented 6 years ago

Another interesting metric: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score

LoyVanBeek commented 6 years ago

I trained a network on ~/MEGA/data/robotics_testlabs/training_data_Josja/training

I tested the network on our test_data in ~/MEGA/data/robotics_testlabs/training_data_Josja/test_data: results .

and then again with the same images cropped to 90% (using find . -name '*.jpg' -execdir mogrify -crop 90%x+0+0 -gravity Center {} \;), to naively get rid of a bit of background: results_cropped_90pct

Naively cropping makes classification much worse. A good segmentation is important.

LoyVanBeek commented 6 years ago

Trained on 6 classes with 60 images each. 1000 train steps, batch size of 100.

results

LoyVanBeek commented 6 years ago

Inception

inception

Mobilenet

mobilenet

CBHealth commented 6 years ago

I read a Paper about them. I will be back at the 7th, then I can send it and give u an overview about accuracy and spend.

Am 03.01.2018 12:01 schrieb "Matthijs van der Burgh" < notifications@github.com>:

I trained mobilenet, after I synced the tf_retrain script. Evaluation and classification isn't possible yet. Because the input_tensor "Cast" isn't in the main graph in Mobilenet. I don't know why this is and what the solution is. But using "input" as the input_tensor isn't working, because then the sizes are incorrect. Probably we need to do JPG decoding in the script?

Network graphs: Inception

[image: inception] https://user-images.githubusercontent.com/18014833/34517770-c357f50c-f07c-11e7-97dd-78c2a8eabbe1.png Mobilenet

[image: mobilenet] https://user-images.githubusercontent.com/18014833/34517776-c9578f1c-f07c-11e7-8ab4-2d8cdf534a70.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-354985970, or mute the thread https://github.com/notifications/unsubscribe-auth/AenvXQGRle9dZ5JY-JrOtIn8pg1vcwliks5tG13ygaJpZM4Pc0Cl .

LoyVanBeek commented 6 years ago

Judging by https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991, I'd say that indeed the image does need to be resized:

python3 tensorflow/examples/label_image/label_image.py \ 
  --graph=/tmp/mobilenet_0.50_192.pb \
  --labels=/tmp/output_labels.txt \
  --image=/home/harvitronix/ml/blogs/road-not-road/test-image.jpg \
  --input_layer=input \
  --output_layer=final_result \
  --input_mean=128 \
  --input_std=128 \
  --input_width=192 \
  --input_height=192

tue-robotics / tue_robocup