Closed vidavakil closed 5 years ago
More specifically, the job hangs after printing the following:
Restoring parameters from /.../model.ckpt-3000 INFO:tensorflow:Running local_init_op. ... tf_logging.py:115] Running local_init_op. INFO:tensorflow:Done running local_init_op. ... tf_logging.py:115] Done running local_init_op.
I was able to generate from the model after all. It took much longer than without relative_encoding, and I also had to stop the training to release resources, or both would seem to hang for hours.
Hello,
I am trying to train a custom Transformer model that has a decoder only (with a custom bottom['targets']), for sequence generation. I was able to train and generate from the model when I had not specified any other special params. However, the generated sequences frequently had a failure mode where certain tokens repeated too often.
I then added the two following params and am training a new model. hparams.self_attention_type = "dot_product_relative_v2" hparams.max_relative_position = 256
However, now when I run t2t_decoder, it hangs and does not generate any output (and it's hard to kill it with ^C, and I have to do a kill -9). I run the decoder in interactive mode, and simply press the return at the '>' prompt.
t2t_decoder --data_dir="${DATA_DIR}" --decode_hparams="${DECODE_HPARAMS}" --decode_interactive --hparams="sampling_method=random" --hparams_set=${HPARAMS_SET} --model=${MODEL} --problem=${PROBLEM} --output_dir=${TRAIN_DIR}
where:
DECODE_HPARAMS="alpha=0,beam_size=1,extra_length=2048" MODEL=transformer
OS: macOS, High Sierra
$ pip freeze | grep tensor Error [Errno 20] Not a directory: '/Users/vida_vakil/miniconda3/lib/python3.6/site-packages/magenta-1.0.2-py3.6.egg' while executing command git rev-parse Exception: .... NotADirectoryError: [Errno 20] Not a directory: '/Users/vida_vakil/miniconda3/lib/python3.6/site-packages/magenta-1.0.2-py3.6.egg'
The model I am using is based on Score2Perf (https://github.com/tensorflow/magenta/tree/master/magenta/models/score2perf), and I have installed it using instructions from their page, and here: https://github.com/tensorflow/magenta Looks like the error has to do with the egg thing.
$ python -V Python 3.6.6 :: Anaconda, Inc.
tensorflow 1.12.0 tensor2tensor 1.13.0
Thanks in advance