google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow
https://google.github.io/seq2seq/
Apache License 2.0
5.6k stars 1.3k forks source link

Unknown Error from nmt toy example(bleu/value) #266

Open yanghoonkim opened 7 years ago

yanghoonkim commented 7 years ago

What I ran were following lines, which is exactly the same example on the tutorial page(nmt)

ad26kr@ubuntu:~/utils/seq2seq$ python -m bin.train --config_paths=" ./example_configs/nmt_small.yml, ./example_configs/train_seq2seq.yml, ./example_configs/text_metrics_bpe.yml" --model_params " vocab_source: $VOCAB_SOURCE vocab_target: $VOCAB_TARGET" --input_pipeline_train " class: ParallelTextInputPipeline params: source_files:

However, I got errors (please scroll down to the bottom of lines) And I can't find out any solution with related to this problem.



INFO:tensorflow:Loading config from /home/ad26kr/utils/seq2seq/example_configs/nmt_small.yml
INFO:tensorflow:Loading config from /home/ad26kr/utils/seq2seq/example_configs/train_seq2seq.yml
INFO:tensorflow:Loading config from /home/ad26kr/utils/seq2seq/example_configs/text_metrics_bpe.yml
INFO:tensorflow:Final Config:
buckets: 10,20,30,40
default_params:
- {separator: ' '}
- {postproc_fn: seq2seq.data.postproc.strip_bpe}
hooks:
- {class: PrintModelAnalysisHook}
- {class: MetadataCaptureHook}
- {class: SyncReplicasOptimizerHook}
- class: TrainSampleHook
  params: {every_n_steps: 1000}
metrics:
- {class: LogPerplexityMetricSpec}
- class: BleuMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_1/f_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_1/r_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_1/p_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_2/f_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_2/r_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_2/p_score,
    separator: ' '}
- class: RougeMetricSpec
  params: {postproc_fn: seq2seq.data.postproc.strip_bpe, rouge_type: rouge_l/f_score,
    separator: ' '}
model: AttentionSeq2Seq
model_params:
  attention.class: seq2seq.decoders.attention.AttentionLayerDot
  attention.params: {num_units: 128}
  bridge.class: seq2seq.models.bridges.ZeroBridge
  decoder.class: seq2seq.decoders.AttentionDecoder
  decoder.params:
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  embedding.dim: 128
  encoder.class: seq2seq.encoders.BidirectionalRNNEncoder
  encoder.params:
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  optimizer.learning_rate: 0.0001
  optimizer.name: Adam
  optimizer.params: {epsilon: 8.0e-07}
  source.max_seq_len: 50
  source.reverse: false
  target.max_seq_len: 50

WARNING:tensorflow:Ignoring config flag: default_params
INFO:tensorflow:Setting save_checkpoints_secs to 600
INFO:tensorflow:Creating ParallelTextInputPipeline in mode=train
INFO:tensorflow:
ParallelTextInputPipeline:
  !!python/unicode 'num_epochs': null
  !!python/unicode 'shuffle': true
  !!python/unicode 'source_delimiter': !!python/unicode ' '
  !!python/unicode 'source_files': [/home/ad26kr/nmt_data/toy_reverse/train/sources.txt]
  !!python/unicode 'target_delimiter': !!python/unicode ' '
  !!python/unicode 'target_files': [/home/ad26kr/nmt_data/toy_reverse/train/targets.txt]

INFO:tensorflow:Creating ParallelTextInputPipeline in mode=eval
INFO:tensorflow:
ParallelTextInputPipeline:
  !!python/unicode 'num_epochs': 1
  !!python/unicode 'shuffle': false
  !!python/unicode 'source_delimiter': !!python/unicode ' '
  !!python/unicode 'source_files': [/home/ad26kr/nmt_data/toy_reverse/dev/sources.txt]
  !!python/unicode 'target_delimiter': !!python/unicode ' '
  !!python/unicode 'target_files': [/home/ad26kr/nmt_data/toy_reverse/dev/targets.txt]

INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0295485a10>, '_model_dir': '/tmp/nmt_tutorial', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 4, '_session_config': None, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:Creating PrintModelAnalysisHook in mode=train
INFO:tensorflow:
PrintModelAnalysisHook: {}

INFO:tensorflow:Creating MetadataCaptureHook in mode=train
INFO:tensorflow:
MetadataCaptureHook: {!!python/unicode 'step': 10}

INFO:tensorflow:Creating SyncReplicasOptimizerHook in mode=train
INFO:tensorflow:
SyncReplicasOptimizerHook: {}

INFO:tensorflow:Creating TrainSampleHook in mode=train
INFO:tensorflow:
TrainSampleHook: {!!python/unicode 'every_n_secs': null, !!python/unicode 'every_n_steps': 1000,
  !!python/unicode 'source_delimiter': !!python/unicode ' ', !!python/unicode 'target_delimiter': !!python/unicode ' '}

INFO:tensorflow:Creating LogPerplexityMetricSpec in mode=eval
INFO:tensorflow:
LogPerplexityMetricSpec: {}

INFO:tensorflow:Creating BleuMetricSpec in mode=eval
INFO:tensorflow:
BleuMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/f_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/r_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/p_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/f_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/r_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/p_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Creating RougeMetricSpec in mode=eval
INFO:tensorflow:
RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe',
  !!python/unicode 'rouge_type': !!python/unicode 'rouge_l/f_score', !!python/unicode 'separator': !!python/unicode ' ',
  !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}

INFO:tensorflow:Training model for 1000 steps
INFO:tensorflow:Creating AttentionSeq2Seq in mode=train
INFO:tensorflow:
AttentionSeq2Seq:
  !!python/unicode 'attention.class': !!python/unicode 'seq2seq.decoders.attention.AttentionLayerDot'
  !!python/unicode 'attention.params': {num_units: 128}
  !!python/unicode 'bridge.class': !!python/unicode 'seq2seq.models.bridges.ZeroBridge'
  !!python/unicode 'bridge.params': {}
  !!python/unicode 'decoder.class': !!python/unicode 'seq2seq.decoders.AttentionDecoder'
  !!python/unicode 'decoder.params':
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  !!python/unicode 'embedding.dim': 128
  !!python/unicode 'embedding.init_scale': 0.04
  !!python/unicode 'embedding.share': false
  !!python/unicode 'encoder.class': !!python/unicode 'seq2seq.encoders.BidirectionalRNNEncoder'
  !!python/unicode 'encoder.params':
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  !!python/unicode 'inference.beam_search.beam_width': 0
  !!python/unicode 'inference.beam_search.choose_successors_fn': !!python/unicode 'choose_top_k'
  !!python/unicode 'inference.beam_search.length_penalty_weight': 0.0
  !!python/unicode 'optimizer.clip_embed_gradients': 0.1
  !!python/unicode 'optimizer.clip_gradients': 5.0
  !!python/unicode 'optimizer.learning_rate': 0.0001
  !!python/unicode 'optimizer.lr_decay_rate': 0.99
  !!python/unicode 'optimizer.lr_decay_steps': 100
  !!python/unicode 'optimizer.lr_decay_type': !!python/unicode ''
  !!python/unicode 'optimizer.lr_min_learning_rate': 1.0e-12
  !!python/unicode 'optimizer.lr_staircase': false
  !!python/unicode 'optimizer.lr_start_decay_at': 0
  !!python/unicode 'optimizer.lr_stop_decay_at': 2147483647
  !!python/unicode 'optimizer.name': !!python/unicode 'Adam'
  !!python/unicode 'optimizer.params': {epsilon: 8.0e-07}
  !!python/unicode 'optimizer.sync_replicas': 0
  !!python/unicode 'optimizer.sync_replicas_to_aggregate': 0
  !!python/unicode 'source.max_seq_len': 50
  !!python/unicode 'source.reverse': false
  !!python/unicode 'target.max_seq_len': 50
  !!python/unicode 'vocab_source': !!python/unicode '/home/ad26kr/nmt_data/toy_reverse/train/vocab.sources.txt'
  !!python/unicode 'vocab_target': !!python/unicode '/home/ad26kr/nmt_data/toy_reverse/train/vocab.targets.txt'

INFO:tensorflow:Creating vocabulary lookup table of size 23
INFO:tensorflow:Creating vocabulary lookup table of size 23
INFO:tensorflow:Creating BidirectionalRNNEncoder in mode=train
INFO:tensorflow:
BidirectionalRNNEncoder:
  init_scale: 0.04
  rnn_cell:
    cell_class: GRUCell
    cell_params: {num_units: 128}
    dropout_input_keep_prob: 0.8
    dropout_output_keep_prob: 1.0
    num_layers: 1
    residual_combiner: add
    residual_connections: false
    residual_dense: false

INFO:tensorflow:Creating AttentionLayerDot in mode=train
INFO:tensorflow:
AttentionLayerDot: {!!python/unicode 'num_units': 128}

INFO:tensorflow:Creating AttentionDecoder in mode=train
INFO:tensorflow:
AttentionDecoder:
  !!python/unicode 'init_scale': 0.04
  !!python/unicode 'max_decode_length': 100
  !!python/unicode 'rnn_cell':
    cell_class: GRUCell
    cell_params: {num_units: 128}
    dropout_input_keep_prob: 0.8
    dropout_output_keep_prob: 1.0
    num_layers: 1
    residual_combiner: add
    residual_connections: false
    residual_dense: false

INFO:tensorflow:Creating ZeroBridge in mode=train
INFO:tensorflow:
ZeroBridge: {}

INFO:tensorflow:Create CheckpointSaverHook.
4 ops no flops stats due to incomplete shapes. Consider passing run_meta to use run_time shapes.
Parsing GraphDef...
Parsing OpLog...
Preparing Views...
-dump_to_file option is deprecated. Please use -output file:outfile=<filename>
-output stdout is overwritten with -output file:outfile=/tmp/nmt_tutorial/model_analysis.txt
INFO:tensorflow:_TFProfRoot (--/501.91k params)
  model/att_seq2seq/Variable (1, 1/1 params)
  model/att_seq2seq/decode/attention/att_keys/biases (128, 128/128 params)
  model/att_seq2seq/decode/attention/att_keys/weights (256x128, 32.77k/32.77k params)
  model/att_seq2seq/decode/attention/att_query/biases (128, 128/128 params)
  model/att_seq2seq/decode/attention/att_query/weights (128x128, 16.38k/16.38k params)
  model/att_seq2seq/decode/attention_decoder/decoder/attention_mix/biases (128, 128/128 params)
  model/att_seq2seq/decode/attention_decoder/decoder/attention_mix/weights (384x128, 49.15k/49.15k params)
  model/att_seq2seq/decode/attention_decoder/decoder/gru_cell/candidate/bias (128, 128/128 params)
  model/att_seq2seq/decode/attention_decoder/decoder/gru_cell/candidate/kernel (512x128, 65.54k/65.54k params)
  model/att_seq2seq/decode/attention_decoder/decoder/gru_cell/gates/bias (256, 256/256 params)
  model/att_seq2seq/decode/attention_decoder/decoder/gru_cell/gates/kernel (512x256, 131.07k/131.07k params)
  model/att_seq2seq/decode/attention_decoder/decoder/logits/biases (23, 23/23 params)
  model/att_seq2seq/decode/attention_decoder/decoder/logits/weights (128x23, 2.94k/2.94k params)
  model/att_seq2seq/decode/target_embedding/W (23x128, 2.94k/2.94k params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/bw/gru_cell/candidate/bias (128, 128/128 params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/bw/gru_cell/candidate/kernel (256x128, 32.77k/32.77k params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/bw/gru_cell/gates/bias (256, 256/256 params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/bw/gru_cell/gates/kernel (256x256, 65.54k/65.54k params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/fw/gru_cell/candidate/bias (128, 128/128 params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/fw/gru_cell/candidate/kernel (256x128, 32.77k/32.77k params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/fw/gru_cell/gates/bias (256, 256/256 params)
  model/att_seq2seq/encode/bidi_rnn_encoder/bidirectional_rnn/fw/gru_cell/gates/kernel (256x256, 65.54k/65.54k params)
  model/att_seq2seq/encode/source_embedding/W (23x128, 2.94k/2.94k params)

2017-06-30 13:18:48.096347: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 13:18:48.096405: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 13:18:48.096415: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 13:18:50.347209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:02:00.0
Total memory: 5.94GiB
Free memory: 5.87GiB
2017-06-30 13:18:50.515313: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x4abc080 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-30 13:18:50.516369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:03:00.0
Total memory: 5.94GiB
Free memory: 5.87GiB
2017-06-30 13:18:50.674872: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x4fffb30 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-30 13:18:50.675463: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-30 13:18:50.676050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 2 with properties:
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:83:00.0
Total memory: 5.94GiB
Free memory: 5.87GiB
2017-06-30 13:18:50.838984: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x4ef4b00 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-30 13:18:50.839969: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-30 13:18:50.840870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 3 with properties:
name: GeForce GTX TITAN Black
major: 3 minor: 5 memoryClockRate (GHz) 0.98
pciBusID 0000:84:00.0
Total memory: 5.94GiB
Free memory: 5.87GiB
2017-06-30 13:18:50.841983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 2
2017-06-30 13:18:50.842040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 3
2017-06-30 13:18:50.842120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 2
2017-06-30 13:18:50.842153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 3
2017-06-30 13:18:50.842184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 2 and 0
2017-06-30 13:18:50.842215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 2 and 1
2017-06-30 13:18:50.842863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 3 and 0
2017-06-30 13:18:50.842903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 3 and 1
2017-06-30 13:18:50.843071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 2 3
2017-06-30 13:18:50.843095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y N N
2017-06-30 13:18:50.843110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y N N
2017-06-30 13:18:50.843124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2:   N N Y Y
2017-06-30 13:18:50.843139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3:   N N Y Y
2017-06-30 13:18:50.843165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0)
2017-06-30 13:18:50.843185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:03:00.0)
2017-06-30 13:18:50.843203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0)
2017-06-30 13:18:50.843220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
2017-06-30 13:18:54.128761: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4840 get requests, put_count=4792 evicted_count=1000 eviction_rate=0.208681 and unsatisfied allocation rate=0.23719
2017-06-30 13:18:54.128824: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into /tmp/nmt_tutorial/model.ckpt.
INFO:tensorflow:loss = 3.13519, step = 1
INFO:tensorflow:Prediction followed by Target @ Step 1
====================================================================================================
1 笑 笑 1 15 笑 笑 笑 笑 笑 笑
16 11 18 18 12 笑 16 0 3 11 SEQUENCE_END

7 8 UNK UNK 17 17 17 8 7 7 7 7 7 7 7 7 7
6 9 8 9 4 2 6 13 笑 18 2 5 17 16 5 9 SEQUENCE_END

2 2 2 2 2 2 2 2 1 1 1 11 11 9 SEQUENCE_END 2 SEQUENCE_END SEQUENCE_END SEQUENCE_END
2 17 0 13 6 0 1 15 5 7 6 6 3 10 9 7 16 15 SEQUENCE_END

16 SEQUENCE_END UNK 16 UNK UNK 10 10 10 5 11 11 16 16 9
2 6 0 16 5 8 8 4 0 15 笑 7 17 14 SEQUENCE_END

1 1 SEQUENCE_END 8 SEQUENCE_END UNK UNK 1 9 14 14 10 10 9 9 9
14 10 1 12 16 2 8 8 6 4 7 5 15 9 13 SEQUENCE_END

1 1 1 1 1 12 7 笑 18 笑 2
18 0 11 18 5 0 9 10 12 13 SEQUENCE_END

1 SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END 6
14 4 14 16 8 7 14 5 16 3 10 17 7 12 7 12 4 SEQUENCE_END

14 7 7 10 UNK 10 10 10 UNK 9 10 10 笑
13 8 10 6 17 13 笑 9 13 10 16 11 SEQUENCE_END

13 13 9 17 SEQUENCE_START SEQUENCE_START 1 9 9 16 4 9
15 15 12 7 11 15 2 13 1 9 14 SEQUENCE_END

13 2 SEQUENCE_END 2 2 2 SEQUENCE_END UNK 2 2 2 UNK 2
13 10 2 7 17 12 6 13 2 9 16 0 SEQUENCE_END

8 16 SEQUENCE_START SEQUENCE_START SEQUENCE_START 16 16 16 16 16 16 16 16 16 16 SEQUENCE_END SEQUENCE_END SEQUENCE_END
5 8 7 8 笑 5 笑 1 8 11 7 7 10 15 16 16 4 SEQUENCE_END

10 10 10 15 2 15 11 2 SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END
7 4 13 2 笑 15 3 7 14 6 15 11 16 10 7 SEQUENCE_END

1 1 1 11 11 11 11 1 1 12 12 笑 笑 笑 笑 笑
8 0 16 6 15 12 18 0 7 0 12 3 9 16 5 SEQUENCE_END

1 1 1 1 1 8 SEQUENCE_END 1 1 10 SEQUENCE_START 13 1 1 3 SEQUENCE_START SEQUENCE_START
17 16 18 10 10 12 6 1 5 5 0 14 17 3 7 6 SEQUENCE_END

13 16 13 1 13 13 13 9 16 16 11 11 9 16 16 2 2
1 17 1 6 2 6 3 7 5 15 15 13 1 8 16 17 SEQUENCE_END

15 16 16 16 16 16 16 16 16 4 4 16
7 13 笑 笑 5 0 11 0 9 17 0 SEQUENCE_END

13 13 13 SEQUENCE_END UNK UNK 8 9 9 9 10 10 10 10 8 8 SEQUENCE_END
4 13 10 14 6 1 15 0 8 10 10 5 17 2 4 12 SEQUENCE_END

1 1 2 16 16 16 16 16 16 16 16 16 16 16 11 11 11 11 SEQUENCE_END
0 13 1 2 0 0 7 9 0 5 0 13 11 15 10 4 7 14 SEQUENCE_END

15 15 10 7 1 3 3 笑 12 9 9
0 8 0 14 14 17 17 2 6 18 SEQUENCE_END

4 SEQUENCE_END UNK UNK UNK UNK 17 UNK UNK 7 7 7 7 7 16 16
14 5 6 9 3 9 13 3 4 2 3 13 15 5 1 SEQUENCE_END

2 2 14 13 2 2 9 16 2 2 2
8 16 5 13 2 15 1 9 17 1 SEQUENCE_END

13 2 16 16 11 16 16 16 9 9 9 2 12 12 9
0 1 8 15 0 18 17 3 15 2 9 1 17 6 SEQUENCE_END

0 UNK UNK 笑 笑 UNK UNK UNK UNK UNK UNK 0 UNK UNK
13 7 17 12 6 13 8 13 8 3 2 16 9 SEQUENCE_END

1 1 1 1 SEQUENCE_END SEQUENCE_END UNK SEQUENCE_END SEQUENCE_END SEQUENCE_END SEQUENCE_END 1 1 10 SEQUENCE_END 1 1
0 8 16 3 16 6 10 10 16 13 5 14 5 16 13 9 SEQUENCE_END

16 16 4 12 16 2 2 12 2 SEQUENCE_START SEQUENCE_START 17 11 11 11 9 9 9 16
1 4 5 13 3 2 18 6 11 16 15 7 6 18 3 8 9 1 SEQUENCE_END

13 13 UNK 8 UNK 8 8 2 2 2
1 6 10 3 2 7 13 9 10 SEQUENCE_END

13 1 1 1 1 1 13 13 13 1 13 1 1 SEQUENCE_END
1 11 13 2 5 16 15 13 11 16 5 14 10 SEQUENCE_END

1 5 笑 笑 5 5 1 5 1 0 0 9
3 10 10 18 18 12 6 13 2 2 15 SEQUENCE_END

8 16 16 16 16 16 8 6 6 笑 SEQUENCE_START 6 UNK 笑 6 16 16 笑 16
1 11 11 17 8 4 12 12 6 12 14 9 3 8 1 3 笑 13 SEQUENCE_END

13 2 2 16 笑 11 12 12 12 笑 2 12 12 12 2 2 2 2
9 7 11 笑 15 2 17 17 16 0 2 4 1 3 笑 9 3 SEQUENCE_END

10 15 10 SEQUENCE_END 15 10 SEQUENCE_END 笑 10 10 10
18 10 16 18 10 10 17 8 10 12 SEQUENCE_END

1 1 1 SEQUENCE_END SEQUENCE_END 1 13 13 SEQUENCE_END 10 SEQUENCE_END SEQUENCE_END 9 SEQUENCE_END SEQUENCE_END SEQUENCE_END 6 1
14 0 17 6 0 4 16 10 4 12 4 2 12 8 4 13 7 SEQUENCE_END

====================================================================================================

INFO:tensorflow:Performing full trace on next step.
2017-06-30 13:18:56.192768: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
INFO:tensorflow:Captured full trace at step 11
INFO:tensorflow:Saved run_metadata to /tmp/nmt_tutorial/run_meta
INFO:tensorflow:Saved timeline to /tmp/nmt_tutorial/timeline.json
INFO:tensorflow:Saved op log to /tmp/nmt_tutorial
2017-06-30 13:19:11.485391: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6104 get requests, put_count=5871 evicted_count=1000 eviction_rate=0.170329 and unsatisfied allocation rate=0.205767
2017-06-30 13:19:11.485452: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
2017-06-30 13:19:14.685254: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 10994 get requests, put_count=10962 evicted_count=1000 eviction_rate=0.0912242 and unsatisfied allocation rate=0.0992359
2017-06-30 13:19:14.685320: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 655 to 720
INFO:tensorflow:global_step/sec: 3.07244
INFO:tensorflow:loss = 3.09452, step = 101 (32.548 sec)
INFO:tensorflow:global_step/sec: 5.65549
INFO:tensorflow:loss = 2.9725, step = 201 (17.681 sec)
INFO:tensorflow:global_step/sec: 5.72307
INFO:tensorflow:loss = 2.90533, step = 301 (17.473 sec)
INFO:tensorflow:global_step/sec: 5.68733
INFO:tensorflow:loss = 2.85999, step = 401 (17.583 sec)
INFO:tensorflow:global_step/sec: 5.68119
INFO:tensorflow:loss = 2.83257, step = 501 (17.602 sec)
INFO:tensorflow:global_step/sec: 5.63825
INFO:tensorflow:loss = 2.93255, step = 601 (17.736 sec)
INFO:tensorflow:global_step/sec: 5.67299
INFO:tensorflow:loss = 2.78754, step = 701 (17.628 sec)
INFO:tensorflow:global_step/sec: 5.68169
INFO:tensorflow:loss = 2.88854, step = 801 (17.600 sec)
INFO:tensorflow:global_step/sec: 5.69611
INFO:tensorflow:loss = 2.79666, step = 901 (17.556 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/nmt_tutorial/model.ckpt.
INFO:tensorflow:Loss for final step: 2.77879.
INFO:tensorflow:Evaluating model now.
INFO:tensorflow:Creating AttentionSeq2Seq in mode=eval
INFO:tensorflow:
AttentionSeq2Seq:
  !!python/unicode 'attention.class': !!python/unicode 'seq2seq.decoders.attention.AttentionLayerDot'
  !!python/unicode 'attention.params': {num_units: 128}
  !!python/unicode 'bridge.class': !!python/unicode 'seq2seq.models.bridges.ZeroBridge'
  !!python/unicode 'bridge.params': {}
  !!python/unicode 'decoder.class': !!python/unicode 'seq2seq.decoders.AttentionDecoder'
  !!python/unicode 'decoder.params':
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  !!python/unicode 'embedding.dim': 128
  !!python/unicode 'embedding.init_scale': 0.04
  !!python/unicode 'embedding.share': false
  !!python/unicode 'encoder.class': !!python/unicode 'seq2seq.encoders.BidirectionalRNNEncoder'
  !!python/unicode 'encoder.params':
    rnn_cell:
      cell_class: GRUCell
      cell_params: {num_units: 128}
      dropout_input_keep_prob: 0.8
      dropout_output_keep_prob: 1.0
      num_layers: 1
  !!python/unicode 'inference.beam_search.beam_width': 0
  !!python/unicode 'inference.beam_search.choose_successors_fn': !!python/unicode 'choose_top_k'
  !!python/unicode 'inference.beam_search.length_penalty_weight': 0.0
  !!python/unicode 'optimizer.clip_embed_gradients': 0.1
  !!python/unicode 'optimizer.clip_gradients': 5.0
  !!python/unicode 'optimizer.learning_rate': 0.0001
  !!python/unicode 'optimizer.lr_decay_rate': 0.99
  !!python/unicode 'optimizer.lr_decay_steps': 100
  !!python/unicode 'optimizer.lr_decay_type': !!python/unicode ''
  !!python/unicode 'optimizer.lr_min_learning_rate': 1.0e-12
  !!python/unicode 'optimizer.lr_staircase': false
  !!python/unicode 'optimizer.lr_start_decay_at': 0
  !!python/unicode 'optimizer.lr_stop_decay_at': 2147483647
  !!python/unicode 'optimizer.name': !!python/unicode 'Adam'
  !!python/unicode 'optimizer.params': {epsilon: 8.0e-07}
  !!python/unicode 'optimizer.sync_replicas': 0
  !!python/unicode 'optimizer.sync_replicas_to_aggregate': 0
  !!python/unicode 'source.max_seq_len': 50
  !!python/unicode 'source.reverse': false
  !!python/unicode 'target.max_seq_len': 50
  !!python/unicode 'vocab_source': !!python/unicode '/home/ad26kr/nmt_data/toy_reverse/train/vocab.sources.txt'
  !!python/unicode 'vocab_target': !!python/unicode '/home/ad26kr/nmt_data/toy_reverse/train/vocab.targets.txt'

INFO:tensorflow:Creating vocabulary lookup table of size 23
INFO:tensorflow:Creating vocabulary lookup table of size 23
INFO:tensorflow:Creating BidirectionalRNNEncoder in mode=eval
INFO:tensorflow:
BidirectionalRNNEncoder:
  init_scale: 0.04
  rnn_cell:
    cell_class: GRUCell
    cell_params: {num_units: 128}
    dropout_input_keep_prob: 0.8
    dropout_output_keep_prob: 1.0
    num_layers: 1
    residual_combiner: add
    residual_connections: false
    residual_dense: false

INFO:tensorflow:Creating AttentionLayerDot in mode=eval
INFO:tensorflow:
AttentionLayerDot: {!!python/unicode 'num_units': 128}

INFO:tensorflow:Creating AttentionDecoder in mode=eval
INFO:tensorflow:
AttentionDecoder:
  !!python/unicode 'init_scale': 0.04
  !!python/unicode 'max_decode_length': 100
  !!python/unicode 'rnn_cell':
    cell_class: GRUCell
    cell_params: {num_units: 128}
    dropout_input_keep_prob: 0.8
    dropout_output_keep_prob: 1.0
    num_layers: 1
    residual_combiner: add
    residual_connections: false
    residual_dense: false

INFO:tensorflow:Creating ZeroBridge in mode=eval
INFO:tensorflow:
ZeroBridge: {}

INFO:tensorflow:Starting evaluation at 2017-06-30-04:22:07
2017-06-30 13:22:07.781207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0)
2017-06-30 13:22:07.781257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:03:00.0)
2017-06-30 13:22:07.781269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0)
2017-06-30 13:22:07.781279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
INFO:tensorflow:Restoring parameters from /tmp/nmt_tutorial/model.ckpt-1000
2017-06-30 13:22:08.054282: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: Reached limit of 1
         [[Node: dev_input_fn/parallel_read_1/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs)]]
2017-06-30 13:22:08.314634: W tensorflow/core/framework/op_kernel.cc:1158] Unknown: exceptions.OSError: [Errno 8] Exec format error
Traceback (most recent call last):
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
    tf.app.run()
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
    schedule=FLAGS.schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
    return _execute_schedule(experiment, schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
    return task()
  File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
    hooks=self._eval_hooks)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 543, in evaluate
    log_progress=log_progress)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 855, in _evaluate_model
    config=self._session_config)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 182, in _evaluate_once
    session.run(eval_ops, feed_dict)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: exceptions.OSError: [Errno 8] Exec format error
         [[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
         [[Node: bleu/value/_351 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_390_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'bleu/value', defined at:
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
    tf.app.run()
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
    schedule=FLAGS.schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
    return _execute_schedule(experiment, schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
    return task()
  File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
    hooks=self._eval_hooks)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 543, in evaluate
    log_progress=log_progress)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 829, in _evaluate_model
    model_fn_results = self._get_eval_ops(features, labels, metrics)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1196, in _get_eval_ops
    metrics, features, labels, model_fn_ops.predictions))
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 269, in _make_metrics_ops
    result[name] = metric.create_metric_ops(features, labels, predictions)
  File "seq2seq/metrics/metric_specs.py", line 124, in create_metric_ops
    name="value")
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
    name=name)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

**UnknownError (see above for traceback): exceptions.OSError: [Errno 8] Exec format error
         [[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
         [[Node: bleu/value/_351 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_390_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]**
yanghoonkim commented 7 years ago

the error above happened with tensorflow 1.2 While I re-installed to 1.0, I got a similar error:

INFO:tensorflow:Creating ZeroBridge in mode=eval
INFO:tensorflow:
ZeroBridge: {}

INFO:tensorflow:Starting evaluation at 2017-06-30-06:18:48
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
         [[Node: dev_input_fn/parallel_read/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@dev_input_fn/parallel_read/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](dev_input_fn/parallel_read/filenames/limit_epochs/epochs)]]
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
         [[Node: dev_input_fn/parallel_read_1/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs)]]
Traceback (most recent call last):
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in __call__
    ret = func(*args)
  File "seq2seq/metrics/metric_specs.py", line 156, in _py_func
    return self.metric_fn(sliced_hypotheses, sliced_references) #pylint: disable=E1102
  File "seq2seq/metrics/metric_specs.py", line 181, in metric_fn
    return bleu.moses_multi_bleu(hypotheses, references, lowercase=False)
  File "seq2seq/metrics/bleu.py", line 79, in moses_multi_bleu
    bleu_cmd, stdin=read_pred, stderr=subprocess.STDOUT)
  File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 212, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 390, in __init__
    errread, errwrite)
  File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
    raise child_exception
OSError: [Errno 8] Exec format error
W tensorflow/core/framework/op_kernel.cc:993] Internal: Failed to run py callback pyfunc_0: see error log.
Traceback (most recent call last):
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
    tf.app.run()
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
    schedule=FLAGS.schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
    return task()
  File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
    hooks=self._eval_hooks)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
    return func(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
    log_progress=log_progress)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 836, in _evaluate_model
    hooks=hooks)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 430, in evaluate_once
    session.run(eval_ops, feed_dict)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
    return self._sess.run(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 891, in run
    run_metadata=run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
    return self._sess.run(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
         [[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
         [[Node: bleu/value/_349 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_714_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'bleu/value', defined at:
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
    tf.app.run()
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
    schedule=FLAGS.schedule)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
    return task()
  File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
    hooks=self._eval_hooks)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
    return func(*args, **kwargs)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
    log_progress=log_progress)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 810, in _evaluate_model
    eval_ops = self._get_eval_ops(features, labels, metrics)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1195, in _get_eval_ops
    metrics, features, labels, model_fn_ops.predictions))
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 258, in _make_metrics_ops
    result[name] = metric.create_metric_ops(features, labels, predictions)
  File "seq2seq/metrics/metric_specs.py", line 124, in create_metric_ops
    name="value")
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
    name=name)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
         [[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
         [[Node: bleu/value/_349 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_714_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
yanghoonkim commented 7 years ago

looks similar to #77(The first 1000 training steps are good. But then the evaluation failed) but not the same problem

Lavine24 commented 7 years ago

Have you solved the problem? My dear friends. I met the same error Now.

yanghoonkim commented 7 years ago

@Lavine24 It looks like this repository won't be updated at least these days. I heard that early version of tf-seq2seq works well, but I don't know which one is. you may refer to tensorflow nmt tutorial here: https://github.com/tensorflow/nmt