mindspore-lab / mindcv

A toolbox of vision models and algorithms based on MindSpore
https://mindspore-lab.github.io/mindcv/
Apache License 2.0
231 stars 140 forks source link

[squeezenet] [Ascend910] [GRAPH] Unable to reproduce precision #715

Closed 787918582 closed 1 month ago

787918582 commented 1 year ago

If this is your first time, please read our contributor guidelines: https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md

Describe the bug/ 问题描述 (Mandatory / 必填) squeezenet_1_0& squeezenet_1_1边训边推过程中精度异常

To Reproduce / 重现步骤 (Mandatory / 必填) Steps to reproduce the behavior:

  1. mpirun --allow-run-as-root -n 8 python train.py --config configs/squeezenet/squeezenet_1.0_ascend.yaml --distribute True --data_dir /ImageNet_Origin/

Expected behavior / 预期结果 (Mandatory / 必填) 可复现达标精度

Screenshots/ 日志 / 截图 (Mandatory / 必填) [2023-07-11 13:53:06] mindcv.utils.callbacks INFO - Epoch: [195/200], batch: [5004/5004], loss: 6.907755, lr: 0.000154, time: 97.860354s [2023-07-11 13:53:11] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.961876s [2023-07-11 13:53:11] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-195_5004.ckpt [2023-07-11 13:53:11] mindcv.utils.callbacks INFO - Total time since last epoch: 102.965617(train: 97.866245, val: 4.961876)s, ETA: 514.828086s [2023-07-11 13:53:11] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 13:54:49] mindcv.utils.callbacks INFO - Epoch: [196/200], batch: [5004/5004], loss: 6.907755, lr: 0.000099, time: 98.291733s [2023-07-11 13:54:54] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.957407s [2023-07-11 13:54:54] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-196_5004.ckpt [2023-07-11 13:54:54] mindcv.utils.callbacks INFO - Total time since last epoch: 103.394972(train: 98.297709, val: 4.957407)s, ETA: 413.579886s [2023-07-11 13:54:54] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 13:56:34] mindcv.utils.callbacks INFO - Epoch: [197/200], batch: [5004/5004], loss: 6.907755, lr: 0.000056, time: 99.201918s [2023-07-11 13:56:38] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.923371s [2023-07-11 13:56:39] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-197_5004.ckpt [2023-07-11 13:56:39] mindcv.utils.callbacks INFO - Total time since last epoch: 104.277067(train: 99.208602, val: 4.923371)s, ETA: 312.831202s [2023-07-11 13:56:39] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 13:58:17] mindcv.utils.callbacks INFO - Epoch: [198/200], batch: [5004/5004], loss: 6.907755, lr: 0.000025, time: 98.811944s [2023-07-11 13:58:22] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.935712s [2023-07-11 13:58:22] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-198_5004.ckpt [2023-07-11 13:58:23] mindcv.utils.callbacks INFO - Total time since last epoch: 103.892637(train: 98.817379, val: 4.935712)s, ETA: 207.785274s [2023-07-11 13:58:23] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 14:00:01] mindcv.utils.callbacks INFO - Epoch: [199/200], batch: [5004/5004], loss: 6.907755, lr: 0.000006, time: 98.633017s [2023-07-11 14:00:06] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.964054s [2023-07-11 14:00:06] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-199_5004.ckpt [2023-07-11 14:00:06] mindcv.utils.callbacks INFO - Total time since last epoch: 103.742033(train: 98.637943, val: 4.964054)s, ETA: 103.742033s [2023-07-11 14:00:06] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 14:01:45] mindcv.utils.callbacks INFO - Epoch: [200/200], batch: [5004/5004], loss: 6.907755, lr: 0.000000, time: 99.083169s [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5000%, time: 4.841324s [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - Saving model to ./ckpt/squeezenet1_0-200_5004.ckpt [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - Total time since last epoch: 104.066254(train: 99.087659, val: 4.841324)s, ETA: 0.000000s [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - -------------------------------------------------------------------------------- [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - Finish training! [2023-07-11 14:01:50] mindcv.utils.callbacks INFO - The best validation Top_1_Accuracy is: 0.1000% at epoch 1. [2023-07-11 14:01:51] mindcv.utils.callbacks INFO - ================================================================================

Additional context / 备注 (Optional / 选填) Add any other context about the problem here.