Closed reinzor closed 5 years ago
Current implementation: https://github.com/tue-robotics/image_recognition
Might be useful, ROS wrapper of darknet: https://github.com/leggedrobotics/darknet_ros
Useful for selecting 3D sensor https://roscon.ros.org/2017/presentations/ROSCon%202017%203D%20Vision%20Technology.pdf
From @alberth: object_recognition_research.txt
Popular methods/systems for classification:
Best team: Homer@UniKoblenz, using SURF with some additional smart (sounding) tricks.
Weights of pre-trained nets: https://github.com/fchollet/keras/tree/master/keras/applications
Another suspicion is that the recognition is spoiled by bad segmentation. So, how can segmentation be improved? Image Segmentation from scratch on RGBD-images directly is a waste, because we already do background subtraction in ED with the world model.
But the boundaries of the segmentation are not the exact borders of the objects and could thus be improved. There are several methods to do this:
A name that keeps popping up is pGb.
We could use the segmentation from ED as a base for so-called 'weak supervision' or as a 'seed' to start the segmentation from.
For semantic segmentation, the competition to look at is based on the PASCAL VOC2012 dataset But for semantic segmentation, you need classes as well, so it might not be a good fit. However, just being able to separate without knowing which class is fine as well. The classification is done separately anyway.
I don't know how current ED is perceived, but if you ever want to get rid of the pre-baked world that Amigo needs (as I understood), relying on ED world model subtraction would become a PITA.
Cloud based object recognition services:
Retraining DetectNet: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#model-creation
DetectNet is derived from GoogLeNet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.
What were the conclusions of today\?
-Rein
On Sat, Nov 4, 2017 at 4:40 PM, Loy notifications@github.com wrote:
Retraining DetectNet: https://github.com/NVIDIA/ DIGITS/blob/master/examples/object-detection/README.md#model-creation
DetectNet is derived from GoogLeNet https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md#detectnet, which is also based on Inception_v3 which we use now. Retraining DetectNet should thus be possible.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-341906630, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4l2VK0rEdCdfzU5Wip2cpsXmrRZzYks5szIVsgaJpZM4Pc0Cl .
@reinzor Not many conclusions (well, that rqt segfaults on my laptop) but mostly following tutorials and general study into this topic.
People have been tweaking the threshold of putting a classification in the report of the Storing groceries challenge. A metric useful for tuning this could be the Precision-Recall curve. High precision relates to a low false positive rate, and high recall relates to a low false negative rate, which both depend on the threshold we put on classifications & detentions.
I trained a network on ~/MEGA/data/robotics_testlabs/training_data_Josja/training
I tested the network on our test_data in ~/MEGA/data/robotics_testlabs/training_data_Josja/test_data
:
.
and then again with the same images cropped to 90% (using find . -name '*.jpg' -execdir mogrify -crop 90%x+0+0 -gravity Center {} \;
), to naively get rid of a bit of background:
Naively cropping makes classification much worse. A good segmentation is important.
Trained on 6 classes with 60 images each. 1000 train steps, batch size of 100.
More questions:
More things to try out:
I trained mobilenet, after I synced the tf_retrain script. Evaluation and classification isn't possible yet. Because the input_tensor "Cast" isn't in the main graph in Mobilenet. I don't know why this is and what the solution is. But using "input" as the input_tensor isn't working, because then the sizes are incorrect. Probably we need to do JPG decoding in the script?
Network graphs:
I read a Paper about them. I will be back at the 7th, then I can send it and give u an overview about accuracy and spend.
Am 03.01.2018 12:01 schrieb "Matthijs van der Burgh" < notifications@github.com>:
I trained mobilenet, after I synced the tf_retrain script. Evaluation and classification isn't possible yet. Because the input_tensor "Cast" isn't in the main graph in Mobilenet. I don't know why this is and what the solution is. But using "input" as the input_tensor isn't working, because then the sizes are incorrect. Probably we need to do JPG decoding in the script?
Network graphs: Inception
[image: inception] https://user-images.githubusercontent.com/18014833/34517770-c357f50c-f07c-11e7-97dd-78c2a8eabbe1.png Mobilenet
[image: mobilenet] https://user-images.githubusercontent.com/18014833/34517776-c9578f1c-f07c-11e7-8ab4-2d8cdf534a70.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/395#issuecomment-354985970, or mute the thread https://github.com/notifications/unsubscribe-auth/AenvXQGRle9dZ5JY-JrOtIn8pg1vcwliks5tG13ygaJpZM4Pc0Cl .
Judging by https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991, I'd say that indeed the image does need to be resized:
python3 tensorflow/examples/label_image/label_image.py \
--graph=/tmp/mobilenet_0.50_192.pb \
--labels=/tmp/output_labels.txt \
--image=/home/harvitronix/ml/blogs/road-not-road/test-image.jpg \
--input_layer=input \
--output_layer=final_result \
--input_mean=128 \
--input_std=128 \
--input_width=192 \
--input_height=192
(issue edited by @LoyVanBeek) Our performance at RoboCup in japan was .... The feeling is we do OK when testing but not during actual challenges. Before we start making wild guesses, we need to check out current performance and be able to compare performance.
There are several tracks to work on:
Other implementations
Improvements on current implementation (i.e. inception_v3 retrained on RoboCup data)
(Tools to) measure object recognition performance
Measure object recognition performance
Different sensors
@alberth @JosjaG @LoyVanBeek @blumenkindC3