Open nicholasbreckwoldt opened 4 years ago
@nicholasbreckwoldt: We just released adanet=0.9.0
which includes better TPU, and TF 2 support. Please try installing it, and let us know if it resolves your issue.
@cweill Thanks for the update! I am running into a new issue with the upgrade to TF 2.2
and adanet==0.9.0
which has so far prevented me from establishing whether the above evaluation issue has been resolved. I've added a description of this new issue (#157).
Running into an issue when using Adanet TPUEstimator. Say, for example, the estimator is configured with
max_iteration_steps=500
and it is desired to evaluate the model's performance during training after every 100 training steps (i.e.steps_per_evaluation=100
) for 2 complete Adanet iterations.To achieve this,
estimator.train(max_steps, train_input
) followed byestimator.evaluate(eval_input)
are run in a loop, while incrementingmax_steps
bysteps_per_evaluation
number of steps at the end of each loop, untilmax_steps=1000
is reached (i.e. corresponding to 2 complete Adanet iterations)When running in local mode (i.e.
use_tpu=False
), training proceeds as expected. That is, training proceeds for 2 complete Adanet iterations (i.e. steps 0 to 500 for the first iteration and steps 500 to 1000 for the second iteration, with evaluation every 100 steps). However, when running on CloudTPU (i.e.use_tpu=True
), training reachesmax_steps=1000
without ever progressing to a second iteration.On the other hand, a single call of
estimator.train(max_steps=1000, train_input)
using CloudTPU without theestimator.evaluate
results in 2 complete Adanet iterations as expected. This makes me think the issue lies with the evaluation call? What could the issue be? If this is a TPUEstimator related issue, am I then constrained to the standard Estimator if I want this kind of train-evaluation loop configuration?