when I do as what Training at README says, I got this error, after training 1000 epoches, and the python file tries to load from 1000 epoch checkpoint and to evaluaiton.
Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2_1 (defined at /.pyenv/versions/3.7.9/envs/tensor2tensor/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:629) ]]
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
However, When I run the command at Training again, surprisingly, it succeeds to load from checkpoint and train from 1000 epoch, and save 2000 epoch weights. But then again, when it loads from 2000 epoch checkpoint and try to do evaluaiton, it fails.
For Inference (Sampling from the model), it just fails.
I meet the same problem with tensorflow 2.4.0. When I tried to load a checkpoint downloaded from magenta as in the colab, it fails. When I run the training commands, the checkpoints saved at 1000 epochs cannot be loaded.
Ubuntu 18.04 Python 3.7.9 Tensorflow 2.3.1
When I follow https://github.com/magenta/magenta/blob/master/magenta/models/score2perf/README.md, The problem happens when I follow Training and Sampling from the model part.
The Training command is like below at the README.
when I do as what Training at README says, I got this error, after training 1000 epoches, and the python file tries to load from 1000 epoch checkpoint and to evaluaiton.
However, When I run the command at Training again, surprisingly, it succeeds to load from checkpoint and train from 1000 epoch, and save 2000 epoch weights. But then again, when it loads from 2000 epoch checkpoint and try to do evaluaiton, it fails.
For Inference (Sampling from the model), it just fails.
Anyone could help me? Thanks in advance.