wusaifei / garbage_classify

本文新增添分类,检测,换脸技术等学习教程,各种调参技巧和tricks,卷积结构详细解析可视化,注意力机制代码等详解!本次垃圾分类挑战杯,目的在于构建基于深度学习技术的图像分类模型,实现垃圾图片类别的精准识别,大赛参考深圳垃圾分类标准,按可回收物、厨余垃圾、有害垃圾和其他垃圾四项分类。本项目包含完整的分类网络、数据增强、SVM等各种分类增强策略,后续还会继续更新新的分类技巧。
653 stars 175 forks source link

多gpu并行训练问题 #11

Open dodgaga opened 4 years ago

dodgaga commented 4 years ago

以下是多gpu并行训练的loss:

image

在第一个epoch的时候loss 和对应的 acc是正常的,到第二个epoch有问题,怀疑是合并参数的时候有问题??

dodgaga commented 4 years ago

txt: bacth 16, 4块gpu并行训练

1055/1061 [============================>.] - ETA: 5s - loss: 3.5930 - acc: 0.0740 1056/1061 [============================>.] - ETA: 4s - loss: 3.5926 - acc: 0.0743 1057/1061 [============================>.] - ETA: 3s - loss: 3.5921 - acc: 0.0743 1058/1061 [============================>.] - ETA: 2s - loss: 3.5919 - acc: 0.0744 1059/1061 [============================>.] - ETA: 1s - loss: 3.5917 - acc: 0.0745 1060/1061 [============================>.] - ETA: 0s - loss: 3.5915 - acc: 0.0746 1061/1061 [==============================] - 919s 867ms/step - loss: 3.5911 - acc: 0.0748 - val_loss: 1.1921e-07 - val_acc: 0.0207 save weights file ./model_snapshots_multi/weights_000_0.0207.h5 Epoch 2/60

1/1061 [..............................] - ETA: 12:46 - loss: 3.2449 - acc: 0.1875 2/1061 [..............................] - ETA: 12:55 - loss: 3.3741 - acc: 0.1250 3/1061 [..............................] - ETA: 13:06 - loss: 3.2832 - acc: 0.1667 4/1061 [..............................] - ETA: 13:00 - loss: 3.2520 - acc: 0.1719 5/1061 [..............................] - ETA: 13:00 - loss: 3.2117 - acc: 0.1875 6/1061 [..............................] - ETA: 12:26 - loss: 3.3068 - acc: 0.1562 7/1061 [..............................] - ETA: 12:30 - loss: 2.8344 - acc: 0.1339 8/1061 [..............................] - ETA: 12:33 - loss: 2.4801 - acc: 0.1172 9/1061 [..............................] - ETA: 12:35 - loss: 2.2046 - acc: 0.1181 10/1061 [..............................] - ETA: 12:33 - loss: 1.9841 - acc: 0.1062 11/1061 [..............................] - ETA: 12:34 - loss: 1.8037 - acc: 0.0966 12/1061 [..............................] - ETA: 12:36 - loss: 1.6534 - acc: 0.0938 13/1061 [..............................] - ETA: 12:38 - loss: 1.5262 - acc: 0.0865 14/1061 [..............................] - ETA: 12:39 - loss: 1.4172 - acc: 0.0804 15/1061 [..............................] - ETA: 12:38 - loss: 1.3227 - acc: 0.0750 16/1061 [..............................] - ETA: 12:38 - loss: 1.2401 - acc: 0.0703 17/1061 [..............................] - ETA: 12:38 - loss: 1.1671 - acc: 0.0662 18/1061 [..............................] - ETA: 12:38 - loss: 1.1023 - acc: 0.0660 19/1061 [..............................] - ETA: 12:38 - loss: 1.0443 - acc: 0.0625 20/1061 [..............................] - ETA: 12:38 - loss: 0.9921 - acc: 0.0625 21/1061 [..............................] - ETA: 12:38 - loss: 0.9448 - acc: 0.0595

wusaifei commented 4 years ago

@dodgaga 你好,你是不是没有用预训练参数呢。如果用了预训练参数第一代准确率也会很高的。建议使用预训练参数