tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.36k stars 1.96k forks source link

Problem on model saver logic when avg_ckpts is True. #342

Open lovecambi opened 6 years ago

lovecambi commented 6 years ago

When avg_ckpts is True, there are two folders to store the best model w.r.t to "best bleu" and "avg_best_bleu".

The problem is that the checkpoints file in each folder will be updated based on the file from another folder. For example, this is the checkpoints file in folder "best_bleu" folder.

model_checkpoint_path: "translate.ckpt-7000"
all_model_checkpoint_paths: "translate.ckpt-3000"
all_model_checkpoint_paths: "translate.ckpt-4000"
all_model_checkpoint_paths: "../avg_best_bleu/translate.ckpt-5000"
all_model_checkpoint_paths: "../avg_best_bleu/translate.ckpt-6258"
all_model_checkpoint_paths: "translate.ckpt-7000"

After the weights averaging, the checkpoints file in "avg_best_bleu" folder will be updated based on above file, not the file in its own. Thus, the resulting file is

model_checkpoint_path: "translate.ckpt-7301"
all_model_checkpoint_paths: "../best_bleu/translate.ckpt-4000"
all_model_checkpoint_paths: "translate.ckpt-5000"
all_model_checkpoint_paths: "translate.ckpt-6258"
all_model_checkpoint_paths: "../best_bleu/translate.ckpt-7000"
all_model_checkpoint_paths: "translate.ckpt-7301"

However, whether to update the checkpoints is based on the comparison of its own previous metric. In this logic, it is not guaranteed the best 5 models in the checkpoints are saved. If there are more than one metric to evaluate, like bleu and rouge, it will be more messed up.

One workaround is to define multiple savers, when we have multiple metrics.