tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

--train_steps flag not working #1501

Open Eugen2525 opened 5 years ago

Eugen2525 commented 5 years ago

Description

So I am running tensor2tensor framework with my own model, and I see that --train_steps flag not working., it keeps on training past 1000 that I have defined and stops above 200000 (sorry I could not track exactly when model stopped, but it does go beyond the set steps by a large margin)

    t2t-trainer \
      --t2t_usr_dir=$USR_DIR \
      --data_dir=$DATA_DIR \
      --problem=$PROBLEM \
      --model=$MODEL \
      --hparams_set=$HPARAMS \
      --output_dir=$TRAIN_DIR
      --train_steps=1000 \
      --eval_steps=100

Is there any hint why this could not be working?

Also, what is the flag to train with early stopping? ...

Environment information

OS: <your answer here>
Linux 16
$ pip freeze | grep tensor
# your output here

mesh-tensorflow==0.0.5
tensor2tensor==1.13.0
tensorboard==1.13.1
tensorflow-datasets==1.0.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1
tensorflow-metadata==0.13.0
tensorflow-probability==0.6.0
tensorflow-tensorboard==1.5.1

$ python -V
# your output here

Python 2.7.15 :: Anaconda, Inc.

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
...
ashu5644 commented 5 years ago

I am having similar problem, unable to modify train_steps , eval_steps and eval_after_n_training _steps values. Any help?