rishizek / tensorflow-deeplab-v3

DeepLabv3 built in TensorFlow
MIT License
286 stars 102 forks source link

Failed to train a good model using default settings #11

Closed xiaodaxia closed 6 years ago

xiaodaxia commented 6 years ago

Below is my evaluation log for training with default settings. From the log it can be seen that the mean_iou never get higher than 65%. {'loss': 27.411198, 'mean_iou': 0.032545157, 'px_accuracy': 0.54012704, 'global_step': 147} {'loss': 25.638956, 'mean_iou': 0.04469605, 'px_accuracy': 0.6891069, 'global_step': 294} {'loss': 24.254019, 'mean_iou': 0.065113254, 'px_accuracy': 0.73246646, 'global_step': 441} {'loss': 23.481403, 'mean_iou': 0.10304936, 'px_accuracy': 0.72400516, 'global_step': 588} {'loss': 23.085245, 'mean_iou': 0.13920967, 'px_accuracy': 0.7406871, 'global_step': 735} {'loss': 22.747364, 'mean_iou': 0.18104842, 'px_accuracy': 0.7659982, 'global_step': 882} {'loss': 22.565765, 'mean_iou': 0.21812087, 'px_accuracy': 0.7716243, 'global_step': 1029} {'loss': 22.295029, 'mean_iou': 0.29061812, 'px_accuracy': 0.80634236, 'global_step': 1176} {'loss': 22.195526, 'mean_iou': 0.33454996, 'px_accuracy': 0.8205649, 'global_step': 1323} {'loss': 21.816957, 'mean_iou': 0.4606417, 'px_accuracy': 0.86555535, 'global_step': 1470} {'loss': 21.75902, 'mean_iou': 0.4434907, 'px_accuracy': 0.8586316, 'global_step': 1617} {'loss': 21.597395, 'mean_iou': 0.48515448, 'px_accuracy': 0.87751526, 'global_step': 1764} {'loss': 21.48568, 'mean_iou': 0.51813114, 'px_accuracy': 0.8842893, 'global_step': 1911} {'loss': 21.39514, 'mean_iou': 0.5191962, 'px_accuracy': 0.88622063, 'global_step': 2058} {'loss': 21.277267, 'mean_iou': 0.56164825, 'px_accuracy': 0.8970871, 'global_step': 2205} {'loss': 21.209543, 'mean_iou': 0.54094917, 'px_accuracy': 0.8927846, 'global_step': 2352} {'loss': 21.126732, 'mean_iou': 0.5475543, 'px_accuracy': 0.8912236, 'global_step': 2499} {'loss': 21.038265, 'mean_iou': 0.55428994, 'px_accuracy': 0.8948886, 'global_step': 2646} {'loss': 20.937605, 'mean_iou': 0.5745999, 'px_accuracy': 0.9008515, 'global_step': 2793} {'loss': 20.888403, 'mean_iou': 0.5405788, 'px_accuracy': 0.89332426, 'global_step': 2940} {'loss': 20.74884, 'mean_iou': 0.6236876, 'px_accuracy': 0.91060424, 'global_step': 3087} {'loss': 20.661219, 'mean_iou': 0.63997686, 'px_accuracy': 0.9151486, 'global_step': 3234} {'loss': 20.602966, 'mean_iou': 0.61230946, 'px_accuracy': 0.9092174, 'global_step': 3381} {'loss': 20.517002, 'mean_iou': 0.6187148, 'px_accuracy': 0.9131437, 'global_step': 3528} {'loss': 20.435068, 'mean_iou': 0.6393549, 'px_accuracy': 0.9146484, 'global_step': 3675} {'loss': 20.381002, 'mean_iou': 0.6136152, 'px_accuracy': 0.90923643, 'global_step': 3822} {'loss': 20.31082, 'mean_iou': 0.6096521, 'px_accuracy': 0.90993786, 'global_step': 3969} {'loss': 20.211248, 'mean_iou': 0.6426259, 'px_accuracy': 0.9163362, 'global_step': 4116} {'loss': 20.14878, 'mean_iou': 0.62975436, 'px_accuracy': 0.91417825, 'global_step': 4263} {'loss': 20.061584, 'mean_iou': 0.646002, 'px_accuracy': 0.91878986, 'global_step': 4410} {'loss': 20.05111, 'mean_iou': 0.59056413, 'px_accuracy': 0.90350807, 'global_step': 4557} {'loss': 19.936838, 'mean_iou': 0.6460258, 'px_accuracy': 0.9149093, 'global_step': 4704}

My environment is M40(24G memory)+tensorflow1.4, I do not know whether tensorflow1.4 is the problem.But the only difficulty in getting the latest code to work is a key name change in tf.reduce_mean.I think the difference can be neglected. Also note that the above log is for 32 epochs. I do not know whether it affects the final performance, as I have run a 26 epochs and do not get the result, then I tried with a longer training.

bwuzhang commented 6 years ago

@xiaodaxia Has this issue been resolved?