tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Outdated cloud_tpu.py script #1032

Closed tlatkowski closed 6 years ago

tlatkowski commented 6 years ago

Description

I've tried to reproduce the training procedure for the speech transformer model (https://github.com/tensorflow/tensor2tensor/blob/master/docs/tutorials/asr_with_transformer.md) but encountered some difficulties.

It looks like cloud_tpu.py script is outdated in comparison to the current gcloud api, like: exectuting "gcloud beta compute tpus list" command.

I had to make some workarounds and fixes locally to make it work, what do you think about updating cloud_tpu script to be in line with current gcloud API? ...

Environment information

OS: Ubuntu

$ gcloud version
Google Cloud SDK 214.0.0
alpha 2018.08.03
app-engine-go
app-engine-java 1.9.64
app-engine-php " "
app-engine-python 1.9.74
app-engine-python-extras 1.9.74
beta 2018.07.16
bq 2.0.34
cbt
cloud-datastore-emulator 2.0.2
container-builder-local
core 2018.08.24
datalab 20180820
docker-credential-gcr
gcd-emulator v1beta3-1.0.0
gsutil 4.33
kubectl 2018.08.17
pubsub-emulator 2018.02.02

$ pip freeze | grep tensor

-e git+https://github.com/tensorflow/tensor2tensor.git@ea9934874031a66dc5bd74def5d87fa377131ac8#egg=tensor2tensor
tensorboard==1.9.0
tensorflow==1.9.0

$ python -V
Python 2.7.13

For bugs: reproduction and error logs

# Steps to reproduce:

Training section from https://github.com/tensorflow/tensor2tensor/blob/master/docs/tutorials/asr_with_transformer.md

t2t-trainer \
  --model=transformer \
  --hparams_set=transformer_librispeech_tpu \
  --hparams=max_length=125550,max_input_seq_length=1550,max_target_seq_length=350,batch_size=16 \
  --problem=librispeech_train_full_test_clean \
  --train_steps=210000 \
  --eval_steps=3 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --cloud_tpu \
  --cloud_delete_on_done
# Error logs:
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'compute', 'tpus', 'list']' returned non-zero exit status 1
tlatkowski commented 6 years ago

I'm closing this issue as cloud_tpu scripts was removed. Current version of TPU support follows the official TPU tutorial.