microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.5k stars 4.28k forks source link

Sudden jump in error in 56-th Epoch ? #3654

Open JohnCraigPublic opened 5 years ago

JohnCraigPublic commented 5 years ago

Everything going along well for about 24 hours and up until the 56-th epoch (of a planned 175) and suddenly the error goes way up to around 98% and stays there. What could cause this?

Epoch[56 of 175]-Minibatch[7101-7200, 54.96%]: ce = 0.15268921 3200; errs = 3.625% 3200; time = 17.4793s; samplesPerSecond = 183.1 Epoch[56 of 175]-Minibatch[7201-7300, 55.73%]: ce = 0.09950439 3200; errs = 2.188% 3200; time = 17.5247s; samplesPerSecond = 182.6 Epoch[56 of 175]-Minibatch[7301-7400, 56.49%]: ce = 0.12983032 3200; errs = 2.844% 3200; time = 17.4691s; samplesPerSecond = 183.2 Epoch[56 of 175]-Minibatch[7401-7500, 57.25%]: ce = 0.15393066 3200; errs = 3.188% 3200; time = 17.4969s; samplesPerSecond = 182.9 Epoch[56 of 175]-Minibatch[7501-7600, 58.02%]: ce = 0.16717285 3200; errs = 3.719% 3200; time = 17.4988s; samplesPerSecond = 182.9 Epoch[56 of 175]-Minibatch[7601-7700, 58.78%]: ce = 0.17043701 3200; errs = 3.844% 3200; time = 17.5127s; samplesPerSecond = 182.7 Epoch[56 of 175]-Minibatch[7701-7800, 59.54%]: ce = 0.18733154 3200; errs = 3.969% 3200; time = 17.5019s; samplesPerSecond = 182.8 Epoch[56 of 175]-Minibatch[7801-7900, 60.31%]: ce = 0.19000000 3200; errs = 4.563% 3200; time = 17.4705s; samplesPerSecond = 183.2 Epoch[56 of 175]-Minibatch[7901-8000, 61.07%]: ce = 0.15319580 3200; errs = 3.500% 3200; time = 17.5455s; samplesPerSecond = 182.4 Epoch[56 of 175]-Minibatch[8001-8100, 61.83%]: ce = 0.18163208 3200; errs = 4.094% 3200; time = 17.4907s; samplesPerSecond = 183.0 Epoch[56 of 175]-Minibatch[8101-8200, 62.60%]: ce = 4.63712280 3200; errs = 63.375% 3200; time = 17.5693s; samplesPerSecond = 182.1 Epoch[56 of 175]-Minibatch[8201-8300, 63.36%]: ce = 7.01296875 3200; errs = 98.844% 3200; time = 17.4567s; samplesPerSecond = 183.3 Epoch[56 of 175]-Minibatch[8301-8400, 64.12%]: ce = 6.95506104 3200; errs = 99.031% 3200; time = 17.4979s; samplesPerSecond = 182.9 Epoch[56 of 175]-Minibatch[8401-8500, 64.89%]: ce = 6.90673340 3200; errs = 99.031% 3200; time = 17.4428s; samplesPerSecond = 183.5 Epoch[56 of 175]-Minibatch[8501-8600, 65.65%]: ce = 6.84071045 3200; errs = 98.844% 3200; time = 17.4925s; samplesPerSecond = 182.9 Epoch[56 of 175]-Minibatch[8601-8700, 66.41%]: ce = 6.87031250 3200; errs = 99.031% 3200; time = 17.4691s; samplesPerSecond = 183.2 Epoch[56 of 175]-Minibatch[8701-8800, 67.18%]: ce = 6.85569824 3200; errs = 98.938% 3200; time = 17.4753s; samplesPerSecond = 183.1

delzac commented 5 years ago

This you will need to lower your learning rate as you train, you might want to consider lowering your learning rate in steps. Lastly, please try to keep non cntk specific question to other forums.