tensorlab / tensorfx

TensorFlow framework for training and serving machine learning models
Apache License 2.0
196 stars 41 forks source link

Support for deferred saved model creation #22

Open nikhilk opened 7 years ago

nikhilk commented 7 years ago

The current training job checkpoints every N sec, and only produces a saved model at the very end. The idea is any checkpoint can be converted to a saved model later.

In order to facilitate, we should save the prediction graph out at the start of the training, so the conversion of checkpoint -> saved model doesn't depend on any code running, and re-creating the same graph with same args.

Secondly, once we have that, should the training process even produce a saved model? Is producing a saved model better considered as part of the deployment step of the workflow?