Closed Rayman closed 6 years ago
The code we use breaks with Tensorflow 1.3 but I have some fixes on a branch locally ATM.
Original without any data augmentation: After setting the mirror-aumentation to true: Training takes a lot more time which is very inconvenient, but mirroring part of the data does (slightly) improve the accuracy. The increased training time is probably a CPU issue so by correctly using the GPU this might not be a problem.
With data augmentation the bottleneck caching is disabled so we should look to see if we can train on the GPU. How long did training take?
Besides mirroring, there are different augmentations to try out:
The retrain-script from the Tensorflow examples offers:
Let's put these at 10% for each, see if each one improves performance and if they do, enable them all together. Later, we can decide to implement the other augmentations as well.
To test various augmentations:
mkdir -p /tmp/inception
cd /tmp/inception
wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
tar -zxf inception-2015-12-05.tgz
mkdir -p ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Crop/
mkdir -p ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Scale/
mkdir -p ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Brightness/
mkdir -p ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Flip/
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Crop/ --batch=100 --steps=1000 --random_crop=10
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Scale/ --batch=100 --steps=1000 --random_scale=10
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Brightness/ --batch=100 --steps=1000 --random_brightness=10
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Flip/ --batch=100 --steps=1000 --flip_left_right=10
I ran rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Scale/ --batch=100 --steps=1000 --random_scale=10
on bob's ZBook with a GPU.
This ends in
2017-12-19 22:27:13.330667: Step 990: Train accuracy = 97.0%
2017-12-19 22:27:13.330716: Step 990: Cross entropy = 0.310233
2017-12-19 22:27:13.378443: Step 990: Validation accuracy = 94.0%
2017-12-19 22:28:01.398623: Step 999: Train accuracy = 97.0%
2017-12-19 22:28:01.398673: Step 999: Cross entropy = 0.298153
2017-12-19 22:28:01.447069: Step 999: Validation accuracy = 91.0%
The difference in train and validation accuracy is quite large, so probably overfitted:
We reach a plateau around 250-300 training steps, after that the validation accuracy goes down.
When running
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Crop/ --random_crop=10 --batch=100 --steps=1000
Again a similar gap in train vs. validation accuracy, so maybe overfitting.
2017-12-19 22:47:20.903366: Step 380: Train accuracy = 98.0%
2017-12-19 22:47:20.903447: Step 380: Cross entropy = 0.680191
2017-12-19 22:47:20.993919: Step 380: Validation accuracy = 94.0%
2017-12-19 22:50:23.217630: Step 390: Train accuracy = 99.0%
2017-12-19 22:50:23.217772: Step 390: Cross entropy = 0.654659
2017-12-19 22:50:23.330398: Step 390: Validation accuracy = 92.0%
^C
(had to stop because needed to go home.)
Matthijs:
Trained overnight with the parameters above.
The evaluation on the validation data: Final accuracy: 0.62443438914
When running rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Brightness/ --batch=100 --steps=250 --random_brightness=10
:
And the output:
2017-12-20 02:24:28.152544: Step 240: Train accuracy = 95.0%
2017-12-20 02:24:28.152608: Step 240: Cross entropy = 0.956711
2017-12-20 02:24:28.214349: Step 240: Validation accuracy = 91.0%
2017-12-20 02:30:48.570406: Step 249: Train accuracy = 95.0%
2017-12-20 02:30:48.570485: Step 249: Cross entropy = 0.980782
2017-12-20 02:30:48.632844: Step 249: Validation accuracy = 86.0%
Matthijs:
Evaluation on the validation data: Final accuracy: 0.610859728507
@LoyVanBeek How about the evaluation on the validation data?
Haven't had the time for that yet
When running rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Brightness/ --batch=100 --steps=250 --random_brightness=10
:
And the output:
2017-12-20 02:24:28.152544: Step 240: Train accuracy = 95.0%
2017-12-20 02:24:28.152608: Step 240: Cross entropy = 0.956711
2017-12-20 02:24:28.214349: Step 240: Validation accuracy = 91.0%
2017-12-20 02:30:48.570406: Step 249: Train accuracy = 95.0%
2017-12-20 02:30:48.570485: Step 249: Cross entropy = 0.980782
2017-12-20 02:30:48.632844: Step 249: Validation accuracy = 86.0%
When running rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Flip/ --batch=100 --steps=1000 --flip_left_right=10
:
2017-12-20 20:06:20.026709: Step 990: Cross entropy = 0.301570
2017-12-20 20:06:20.089091: Step 990: Validation accuracy = 85.0%
2017-12-20 20:13:18.808711: Step 999: Train accuracy = 100.0%
2017-12-20 20:13:18.808775: Step 999: Cross entropy = 0.294399
2017-12-20 20:13:18.875108: Step 999: Validation accuracy = 88.0%
But again, the network is probably overfitted after 1000 steps and certainly with a training accuracy of 100% but validation of 'only' 86%.
Can you check on the separate validation set?
-Rein
On Wed, Dec 20, 2017 at 8:18 PM, Loy notifications@github.com wrote:
When running rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Flip/ --batch=100 --steps=1000 --flip_left_right=10:
![screenshot from 2017-12-20 20-16-41](https://user-images. githubusercontent.com/709259/34224318-c502298c-e5c2-11e7- 8394-bd77e2c6fc37.png
2017-12-20 20:06:20.026709: Step 990: Cross entropy = 0.301570 2017-12-20 20:06:20.089091: Step 990: Validation accuracy = 85.0% 2017-12-20 20:13:18.808711: Step 999: Train accuracy = 100.0% 2017-12-20 20:13:18.808775: Step 999: Cross entropy = 0.294399 2017-12-20 20:13:18.875108: Step 999: Validation accuracy = 88.0%
But again, the network is probably overfitted after 1000 steps and certainly with a training accuracy of 100% but validation of 'only' 86%.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/tue_robocup/issues/502#issuecomment-353157199, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-4l_1XtedzSoHpNAOmEvyeKOssKD0Oks5tCV17gaJpZM4QmT1l .
TRAINING accuracy will always be 100% after enough steps.
@MatthijsBurgh said he would run on the validation set. And yes, the network is likely to be overfitting.
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Flip250/ --batch=100 --steps=250 --flip_left_right=10
Result
Final accuracy: 0.62443438914
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Crop250/ --random_crop=10 --batch=100 --steps=250
Result
Final accuracy: 0.619909502262
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Scale/ --scale=10 --batch=100 --steps=1000
Result
Final accuracy: 0.62443438914
rosrun tensorflow_ros retrain ~/MEGA/data/robotics_testlabs/training_data_Josja/training /tmp/inception ~/MEGA/data/robotics_testlabs/training_data_Josja/AugmentationTest/Scale250/ --batch=100 --steps=250 --random_scale=10
Result
So it still sucks... :sob:
Please don't put your batch size that high. Between 10 and 32 I think is good. I told this already a few times.
A lot of testing has been done, but no significant improvements have been shown.
Automatically augment training data with noise, occlusions, warps, shears etc. There is a way to do this, but how to activate it? (Also see http://tflearn.org/data_augmentation/)
Enabling this via the RQT train gui would be nice.