minerva-ml / minerva-training-materials

Learn advanced data science on real-life, curated problems
https://neptune.ml/minerva
MIT License
48 stars 14 forks source link

[whales, task3] task submission ends after 249 epochs with "Sorry, your validation split is messed up. Fix it please." #80

Closed rafajak closed 6 years ago

rafajak commented 6 years ago

Submitting the task (via 'neptune run') was unsuccessful and ended after 249 with an error "Sorry, your validation split is messed up. Fix it please." The validation split isn't a part of any task within the Whales problem - perhaps this error is leaking from Fashion-MNIST part.

(sidenote - the log below suggests that the numbering of prints with 'current learning rate' is shifted by one - epoch 250 vs 249, in 5th and 6th row, respectively)

156834.540647 Connection lost. Retrying...
156834.540935 2018-04-06 08-00-15 minerva >>> epoch 249 batch 110 ...
156834.557396 Connection lost. Retrying...
156834.557587 2018-04-06 08-00-17 minerva >>> epoch 249 average batch time: 0:00:04.0
156834.557771 2018-04-06 08-00-17 minerva >>> epoch 250 current lr: 0.0003252930814335209
156834.557955 2018-04-06 08-00-17 minerva >>> epoch 249 loss: 0.03416
156834.558143 2018-04-06 08-00-17 minerva >>> epoch 249 accuracy: 0.99975
156834.840497 Connection lost. Retrying...
156834.840747 Connection restored!
156895.617163 2018-04-06 08-01-23 minerva >>> epoch 249 validation loss: 0.98257
156895.617506 2018-04-06 08-01-23 minerva >>> epoch 249 validation accuracy: 0.78689
156949.918186 2018-04-06 08-02-18 minerva >>> training finished...
157301.446679 2018-04-06 08-08-09 minerva >>> step classifier_network saving transformer...
157301.793026 2018-04-06 08-08-09 minerva >>> step classifier_network saving outputs...
157301.793355 2018-04-06 08-08-09 minerva >>> step classifier_calibrator adapting inputs
157302.072814 2018-04-06 08-08-10 minerva >>> step classifier_calibrator saving transformer...
157302.073103 2018-04-06 08-08-10 minerva >>> step classifier_calibrator saving outputs...
157302.073299 2018-04-06 08-08-10 minerva >>> step classifier_encoder adapting inputs
157302.073493 2018-04-06 08-08-10 minerva >>> step classifier_encoder loading...
157302.073687 2018-04-06 08-08-10 minerva >>> step classifier_encoder transforming...
157302.256781 2018-04-06 08-08-10 minerva >>> step classifier_output adapting inputs
157302.257108 2018-04-06 08-08-10 minerva >>> step classifier_output loading...
157302.257317 2018-04-06 08-08-10 minerva >>> step classifier_output transforming...
157302.744636 2018-04-06 08-08-10 minerva >>> step classifier_encoder adapting inputs
157302.744928 2018-04-06 08-08-10 minerva >>> step classifier_encoder loading...
157302.745115 2018-04-06 08-08-10 minerva >>> step classifier_encoder transforming...
157302.745298 2018-04-06 08-08-10 minerva >>> step classifier_loader adapting inputs
157302.745478 2018-04-06 08-08-10 minerva >>> step classifier_loader loading...
157302.745656 2018-04-06 08-08-10 minerva >>> step classifier_loader transforming...
157302.745832 2018-04-06 08-08-10 minerva >>> step classifier_network unpacking inputs
157302.746007 2018-04-06 08-08-10 minerva >>> step classifier_network loading...
157302.746183 2018-04-06 08-08-10 minerva >>> step classifier_network transforming...
157351.151419 2018-04-06 08-08-59 minerva >>> step classifier_calibrator adapting inputs
157351.151625 2018-04-06 08-08-59 minerva >>> step classifier_calibrator loading...
157351.15182 2018-04-06 08-08-59 minerva >>> step classifier_calibrator transforming...
157351.330583 2018-04-06 08-08-59 minerva >>> step classifier_encoder adapting inputs
157351.330964 2018-04-06 08-08-59 minerva >>> step classifier_encoder loading...
157351.33123 2018-04-06 08-08-59 minerva >>> step classifier_encoder transforming...
157351.331464 2018-04-06 08-08-59 minerva >>> step classifier_output adapting inputs
157351.331668 2018-04-06 08-08-59 minerva >>> step classifier_output loading...
157351.331846 2018-04-06 08-08-59 minerva >>> step classifier_output transforming...
157351.33203 2018-04-06 08-08-59 minerva >>> step classifier_encoder adapting inputs
157351.332214 2018-04-06 08-08-59 minerva >>> step classifier_encoder loading...
157351.332397 2018-04-06 08-08-59 minerva >>> step classifier_encoder transforming...
157351.332591 2018-04-06 08-08-59 minerva >>> step classifier_loader adapting inputs
157351.332774 2018-04-06 08-08-59 minerva >>> step classifier_loader loading...
157351.332958 2018-04-06 08-08-59 minerva >>> step classifier_loader transforming...
157351.333137 2018-04-06 08-08-59 minerva >>> step classifier_network unpacking inputs
157351.333316 2018-04-06 08-08-59 minerva >>> step classifier_network loading...
157351.53075 2018-04-06 08-08-59 minerva >>> step classifier_network transforming...
157411.719687 2018-04-06 08-09-59 minerva >>> step classifier_calibrator adapting inputs
157411.719895 2018-04-06 08-09-59 minerva >>> step classifier_calibrator loading...
157411.720097 2018-04-06 08-09-59 minerva >>> step classifier_calibrator transforming...
157411.720338 2018-04-06 08-09-59 minerva >>> step classifier_encoder adapting inputs
157411.720538 2018-04-06 08-09-59 minerva >>> step classifier_encoder loading...
157411.720747 2018-04-06 08-09-59 minerva >>> step classifier_encoder transforming...
157411.720945 2018-04-06 08-09-59 minerva >>> step classifier_output adapting inputs
157411.721135 2018-04-06 08-09-59 minerva >>> step classifier_output loading...
157411.721333 2018-04-06 08-09-59 minerva >>> step classifier_output transforming...
157411.928652  
157411.92899 Validation score is 1.9927
157411.929189 Test score is 2.2540
157411.929416 Sorry, your validation split is messed up. Fix it please.
kamil-kaczmarek commented 6 years ago

Hey @rafajak

Yes, this error is from different problem. This information should be specific to the task at hand. Can you design proper message for this task @rafajak? Let me know if you need some help ;-)

Regarding the sidenote - yes this is incorrect and (probably) costly to fix. It depends on the order of callbacks -> we have specified this order in the models.py. However, changing order will require some testing -> adding it to my DoTo list ;-)

rafajak commented 6 years ago

@kamil-kaczmarek done - I've changed the error message to one indicating overfitting. It's included in PR #82. :)

kamil-kaczmarek commented 6 years ago

@rafajak changes from PR #82 are merged