pkmital / CADL

ARCHIVED: Contains historical course materials/Homework materials for the FREE MOOC course on "Creative Applications of Deep Learning w/ Tensorflow" #CADL
Apache License 2.0
1.48k stars 732 forks source link

Checkpoint restoration for VAEGAN needs to account for global step #82

Open indraastra opened 7 years ago

indraastra commented 7 years ago

The VAEGAN training code saves checkpoints using the value of the global training step, which results in checkpoints with names like 'vaegan.ckpt-800.index', for example. Any code that looks for an existing checkpoint also needs to account for this naming scheme, but the existence check used doesn't quite work with this scheme:

if os.path.exists(ckpt_name + '.index') or os.path.exists(ckpt_name):

I would suggest changing the check to something like this:

    latest_checkpoint = tf.train.latest_checkpoint(os.path.dirname(ckpt_name))
    if latest_checkpoint:
        saver.restore(sess, latest_checkpoint)
        print("Model restored from checkpoint {}.".format(latest_checkpoint))
        print("Model checkpoint not found.")

(This won't quite work if checkpoints from multiple models are created in the same directory, since it relies on the presence of a file named 'checkpoint'.)