tue-robotics / tue_robocup

RoboCup challenge implementations
https://github.com/orgs/tue-robotics/projects/2
41 stars 12 forks source link

Research object recognition #395

Closed reinzor closed 5 years ago

reinzor commented 7 years ago

(issue edited by @LoyVanBeek) Our performance at RoboCup in japan was .... The feeling is we do OK when testing but not during actual challenges. Before we start making wild guesses, we need to check out current performance and be able to compare performance.

There are several tracks to work on:

Other implementations

Improvements on current implementation (i.e. inception_v3 retrained on RoboCup data)

(Tools to) measure object recognition performance

Measure object recognition performance

Different sensors

@alberth @JosjaG @LoyVanBeek @blumenkindC3

reinzor commented 7 years ago

Current implementation: https://github.com/tue-robotics/image_recognition

MatthijsBurgh commented 7 years ago

Might be useful, ROS wrapper of darknet: https://github.com/leggedrobotics/darknet_ros

reinzor commented 7 years ago

APC vision https://github.com/andyzeng/apc-vision-toolbox#deep-learning-fcn-ros-package

reinzor commented 7 years ago

Useful for selecting 3D sensor https://roscon.ros.org/2017/presentations/ROSCon%202017%203D%20Vision%20Technology.pdf

LoyVanBeek commented 6 years ago

From @alberth: object_recognition_research.txt

LoyVanBeek commented 6 years ago

Popular methods/systems for classification:

Best team: Homer@UniKoblenz, using SURF with some additional smart (sounding) tricks.

LoyVanBeek commented 6 years ago

Weights of pre-trained nets: https://github.com/fchollet/keras/tree/master/keras/applications

LoyVanBeek commented 6 years ago

Another suspicion is that the recognition is spoiled by bad segmentation. So, how can segmentation be improved? Image Segmentation from scratch on RGBD-images directly is a waste, because we already do background subtraction in ED with the world model.

But the boundaries of the segmentation are not the exact borders of the objects and could thus be improved. There are several methods to do this:

A name that keeps popping up is pGb.

We could use the segmentation from ED as a base for so-called 'weak supervision' or as a 'seed' to start the segmentation from.

For semantic segmentation, the competition to look at is based on the PASCAL VOC2012 dataset But for semantic segmentation, you need classes as well, so it might not be a good fit. However, just being able to separate without knowing which class is fine as well. The classification is done separately anyway.

alberth commented 6 years ago

I don't know how current ED is perceived, but if you ever want to get rid of the pre-baked world that Amigo needs (as I understood), relying on ED world model subtraction would become a PITA.

LoyVanBeek commented 6 years ago

Cloud based object recognition services:

LoyVanBeek commented 6 years ago

Retraining DetectNet: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#model-creation

DetectNet is derived from GoogLeNet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.

reinzor commented 6 years ago

What were the conclusions of today\?

-Rein

On Sat, Nov 4, 2017 at 4:40 PM, Loy notifications@github.com wrote:

Retraining DetectNet: https://github.com/NVIDIA/ DIGITS/blob/master/examples/object-detection/README.md#model-creation

DetectNet is derived from GoogLeNet https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#detectnet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-341906630, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4l2VK0rEdCdfzU5Wip2cpsXmrRZzYks5szIVsgaJpZM4Pc0Cl .

LoyVanBeek commented 6 years ago

@reinzor Not many conclusions (well, that rqt segfaults on my laptop) but mostly following tutorials and general study into this topic.

People have been tweaking the threshold of putting a classification in the report of the Storing groceries challenge. A metric useful for tuning this could be the Precision-Recall curve. High precision relates to a low false positive rate, and high recall relates to a low false negative rate, which both depend on the threshold we put on classifications & detentions.

LoyVanBeek commented 6 years ago

Another interesting metric: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score

LoyVanBeek commented 6 years ago

I trained a network on ~/MEGA/data/robotics_testlabs/training_data_Josja/training

I tested the network on our test_data in ~/MEGA/data/robotics_testlabs/training_data_Josja/test_data: results.

and then again with the same images cropped to 90% (using find . -name '*.jpg' -execdir mogrify -crop 90%x+0+0 -gravity Center {} \;), to naively get rid of a bit of background: results_cropped_90pct

Naively cropping makes classification much worse. A good segmentation is important.

LoyVanBeek commented 6 years ago

Trained on 6 classes with 60 images each. 1000 train steps, batch size of 100.

results

LoyVanBeek commented 6 years ago

More questions:

LoyVanBeek commented 6 years ago

More things to try out:

MatthijsBurgh commented 6 years ago

I trained mobilenet, after I synced the tf_retrain script. Evaluation and classification isn't possible yet. Because the input_tensor "Cast" isn't in the main graph in Mobilenet. I don't know why this is and what the solution is. But using "input" as the input_tensor isn't working, because then the sizes are incorrect. Probably we need to do JPG decoding in the script?

Network graphs:

Inception

inception

Mobilenet

mobilenet

CBHealth commented 6 years ago

I read a Paper about them. I will be back at the 7th, then I can send it and give u an overview about accuracy and spend.

Am 03.01.2018 12:01 schrieb "Matthijs van der Burgh" < notifications@github.com>:

I trained mobilenet, after I synced the tf_retrain script. Evaluation and classification isn't possible yet. Because the input_tensor "Cast" isn't in the main graph in Mobilenet. I don't know why this is and what the solution is. But using "input" as the input_tensor isn't working, because then the sizes are incorrect. Probably we need to do JPG decoding in the script?

Network graphs: Inception

[image: inception] https://user-images.githubusercontent.com/18014833/34517770-c357f50c-f07c-11e7-97dd-78c2a8eabbe1.png Mobilenet

[image: mobilenet] https://user-images.githubusercontent.com/18014833/34517776-c9578f1c-f07c-11e7-8ab4-2d8cdf534a70.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-354985970, or mute the thread https://github.com/notifications/unsubscribe-auth/AenvXQGRle9dZ5JY-JrOtIn8pg1vcwliks5tG13ygaJpZM4Pc0Cl .

LoyVanBeek commented 6 years ago

Judging by https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991, I'd say that indeed the image does need to be resized:

python3 tensorflow/examples/label_image/label_image.py \ 
  --graph=/tmp/mobilenet_0.50_192.pb \
  --labels=/tmp/output_labels.txt \
  --image=/home/harvitronix/ml/blogs/road-not-road/test-image.jpg \
  --input_layer=input \
  --output_layer=final_result \
  --input_mean=128 \
  --input_std=128 \
  --input_width=192 \
  --input_height=192