tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.38k stars 1.96k forks source link

UnicodeEncodeError with environment tensorflow/tensorflow:latest-gpu-py3 #424

Closed ttpro1995 closed 5 years ago

ttpro1995 commented 5 years ago

I run with docker image tensorflow/tensorflow:latest-gpu-py3

with command

python3 -m nmt.nmt_mlflow \
--run_name=vastai_attention_scaled_luong_1 \
--attention=scaled_luong \
--src=vi --tgt=en \
--vocab_prefix=/data/nlp/iwslt15/vocab \
--train_prefix=/data/nlp/iwslt15/train \
--dev_prefix=/data/nlp/iwslt15/tst2012 \
--test_prefix=/data/nlp/iwslt15/tst2013 \
--out_dir=/home/tt/model/nmt_model/vastai_attention_scaled_luong_1 \
--num_train_steps=12000 \
--steps_per_stats=100 \
--num_layers=2 \
--num_units=512 \
--dropout=0.2 \
--encoder_type=bi \
--metrics=bleu

It error

File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/nmt/nmt/nmt_mlflow.py", line 721, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/root/nmt/nmt/nmt_mlflow.py", line 712, in main
    run_main(FLAGS, default_hparams, train_fn, inference_fn)
  File "/root/nmt/nmt/nmt_mlflow.py", line 696, in run_main
    train_fn(hparams, target_session=target_session)
  File "/root/nmt/nmt/train.py", line 512, in train
    sample_tgt_data, avg_ckpts)
  File "/root/nmt/nmt/train.py", line 340, in run_full_eval
    sample_src_data, sample_tgt_data)
  File "/root/nmt/nmt/train.py", line 55, in run_sample_decode
    infer_model.batch_size_placeholder, summary_writer)
  File "/root/nmt/nmt/train.py", line 698, in _sample_decode
    utils.print_out("    src: %s" % src_data[decode_id])
  File "/root/nmt/nmt/utils/misc_utils.py", line 69, in print_out
    print(out_s, end="", file=sys.stdout)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfa' in position 11: ordinal not in range(128)
bobvo23 commented 5 years ago

I encountered the same problem. This code base run smoothly with this docker image tensorflow/tensorflow:latest-gpu (without the py3)

ttpro1995 commented 5 years ago

I found the solution With docker image tensorflow/tensorflow:latest-gpu-py3

Running with environment variablePYTHONIOENCODING=utf-8 will fix the problem, without needing to change any code

Full command line

PYTHONIOENCODING=utf-8 python3 -m nmt.nmt \
--run_name=vastai_attention_scaled_luong_1 \
--attention=scaled_luong \
--src=vi --tgt=en \
--vocab_prefix=/data/nlp/iwslt15/vocab \
--train_prefix=/data/nlp/iwslt15/train \
--dev_prefix=/data/nlp/iwslt15/tst2012 \
--test_prefix=/data/nlp/iwslt15/tst2013 \
--out_dir=/home/tt/model/nmt_model/vastai_attention_scaled_luong_1 \
--num_train_steps=12000 \
--steps_per_stats=100 \
--num_layers=2 \
--num_units=512 \
--dropout=0.2 \
--encoder_type=bi \
--metrics=bleu