microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Learner checkpoint does not serialize and deserialize AdditionalLearningOptions #2328

Open tangyuq opened 7 years ago

tangyuq commented 7 years ago

This issue is created to keep track of a work item.

ke1337 commented 7 years ago

Checkpoint is mainly about continue training with the same script so the learner is not supposed to change, including the additional options. If user do need to continue training with a different learner, he/she should load the model instead of using checkpoints. CNTK should serializing these options in checkpoint to make sure there are no changes, but this issue should not cause any computation difference.

cha-zhang commented 7 years ago

@tangyuq what are the additional learning options you have in mind?

tangyuq commented 6 years ago

The AdditionalLearningOptions is not serialized: https://github.com/Microsoft/CNTK/blob/master/Source/CNTKv2LibraryDll/API/CNTKLibrary.h#L4689.

This means that when the checkpoints are restored l2 and gradient clipping might not be restored properly. It will depends on the assumptions that the source code is the same and that there is not modification of these options by the code run in between. This might be a little bit dangerous as more complicated learners with more customized parameters are being allowed.

ke1337 commented 6 years ago

If we are talking about source code being changed in a way that training results might be affected, then just saving AddtionalLearningOptions would not be enough. Of course we cannot go to the other extreme to hash the source code into checkpoint, as that would be much less flexible. So, I think it should be up to the user whether to load a checkpoint in a different script than what it's created from.