Closed kocmitom closed 6 years ago
It seems that right now the Adam variables get stored, as well as the global step.
You can check this by running and continuing a test and print out the variable before and after using tf.Print
.
Alternatively, you can print out the variables that are being stored during the initialization of the saver, here: https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/tf_manager.py#L90
Therefore I must be doing something wrong? I only added a line to main:
initial_variables="baseline_model/variables.data.index"
This alerts me that some variables are not in the checkpoint and based on the log the global step is not set. I also tried to provide other variables.data.* files.
Drop the ".index" suffix
What do you mean "based on the log"? Global step is not logged. You must use tf.Print
to get the value. The list of variables and their shapes in the log is a list of trainable variables, not global variables. Global variables get stored.
Perfect, thank you. Now it didn't warn about missing variables. As a log I meant the tensorboard output and the fact that performance dropped due to high learning rate (or not having correct variables).
Now it works
What works? The global step and adam vars get loaded?
The global step looks like it got loaded and I suppose the Adam too. Only (cosmetical) problem is, that the tensorboard do not start at the correct step but starts from zero. Couldn't it be somehow augmented so whenever globalstep is loaded the step variable in learning_utils will also increase? But it is only for better visualization.
I have checked it and I can confirm that global step as well as whole Adam is stored in the checkpoint
Hi, I need to continue a training for a purpose of adaptation. What is the easiest way to do so, considering that I need to store all training parameters (especially global step and Adam)