tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.17k stars 3.45k forks source link

Decode issue at inference time on PTB problem #86

Closed pltrdy closed 6 years ago

pltrdy commented 7 years ago

Hi,

Working on the PTB benchmark for language modeling (see https://github.com/tensorflow/tensor2tensor/pull/59) I wrote a little script for this use case (go to script (gist)).

I trained the model, then, when decoding I get the following error:

INFO:tensorflow:This model_fn took 0.785 sec.
Traceback (most recent call last):
  File "/home/pltrdy/anaconda3/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.8', 't2t-trainer')
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 744, in run_script
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 1506, in run_script
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 83, in <module>
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 79, in main
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 240, in run
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 544, in run_locally
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 648, in decode_from_file
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 884, in _infer_model
    infer_ops = self._get_predict_ops(features)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1218, in _get_predict_ops
    return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.INFER)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
    model_fn_results = self._model_fn(features, labels, **kwargs)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 424, in model_fn
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 751, in _cond_on_index
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 397, in nth_model
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/t2t_model.py", line 160, in infer
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensor2tensor-1.0.8-py3.6.egg/tensor2tensor/utils/t2t_model.py", line 299, in _greedy_infer
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/functional_ops.py", line 122, in foldl
    swap_memory=swap_memory)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2580, in _BuildLoop
    _EnforceShapeInvariant(m_var, n_var)
  File "/home/pltrdy/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 575, in _EnforceShapeInvariant
    % (merge_var.name, m_shape, n_shape))
ValueError: The shape for foldl/while/Merge_1:0 is not an invariant for the loop. It enters the loop with shape (32, 75, 1, 1), but has shape (32, 76, 1, 1) after one iteration. Provide shape invariants using either the `shape_invariants` argument of tf.while_loop or set_shape() on the loop variables.

I'm not sure how to fix it since I'm not really comfortable with trainer_utils.py and t2t_model.py functioning.

lukaszkaiser commented 7 years ago

This looks similar to #80 which was (I hope) resolved in the new release 1.0.11. Could you give the new release a try and report back? Thanks!

pltrdy commented 7 years ago

I pulled last results, it's not working, same error.

lukaszkaiser commented 7 years ago

I just re-generated the set and re-trained a small model and I'm not seeing this:

t2t_datagen --data_dir ~/t2t_data/ --tmp_dir ~/t2t_data/tmp/ --problem=lmptb_10k rm -rf /tmp/tensor2tensor/* && t2t_trainer --data_dir ~/t2t_data/ --problems=lmptb_10k --model=attention_lm --hparams_set=attention_lm_base --hparams='batch_size=2048,hidden_size=128,filter_size=512' --train_steps=5000 --eval_steps=10

I get outputs like that: Inference results OUTPUT: baseball that game of the long haul is the <unk> sport of the mean and the mean <unk> law caught up with the san francisco giants in the world series last weekend are generally deliberately <unk> out to travel prices for associations <EOS> in the otc market <EOS> <EOS> and the <unk> at st. paul carlos o'connell key first country 's recent slide <EOS> once <unk> <unk> <EOS> and arm <EOS> <EOS> in the middle unwanted takeover <unk> table <EOS> <unk> and in the night <EOS> and assassination <EOS> <unk> family 's <unk> party 's best last month <EOS> called floor traders say <EOS> <unk> said <EOS> even try to make the apparent <unk> I0706 17:30:24.761709 32406 trainer_utils.py:569] Inference results OUTPUT: bulls say the market is an incredible bargain priced at only about N times estimated N earnings for stocks in the standard & poor 's N index <pad> <pad> <pad> <pad> <pad> <EOS> <EOS> more than N according to dow jones to the fraction <EOS> manuel noriega <EOS> <EOS> <EOS> <EOS> and tends to japanese investment and his own <EOS> <EOS> for a new york office <EOS> for mr. lawson <EOS> <EOS> inc <EOS> entirely as a lost its receipts affair <EOS> once he says <EOS> <EOS> <EOS> <EOS> <EOS> <EOS> to him to a security statistics <EOS> for uncertainties investment president who will probably has fallen <EOS> chief of events for his audience

Could you try the above and tell me if you see an error? What do you run to get it?

pltrdy commented 7 years ago

Hmm, I just did a fresh clone, ran setup then your commands and get an error:

InvalidArgumentError (see above for traceback): indices[75,22,0] = 10000 is not in [0, 10000)
     [[Node: symbol_modality_10000_128/parallel_0/symbol_modality_10000_128/target_emb/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_10000_128/parallel_0/symbol_modality_10000_128/target_emb/ConvertGradientToTensor_cc661786, symbol_modality_10000_128/parallel_0/symbol_modality_10000_128/target_emb/Squeeze)]]

For the reference I'm running this (be aware, I changed "_" to "-"):

 t2t-datagen --data_dir ~/t2t_data/ --tmp_dir ~/t2t_data/tmp/ --problem=lmptb_10k
t2t-trainer --data_dir ~/t2t_data/ \
            --problems=lmptb_10k \
            --model=attention_lm \
            --hparams_set=attention_lm_base \
            --hparams='batch_size=2048,hidden_size=128,filter_size=512' \
            --train_steps=5000 \
            --eval_steps=10
lukaszkaiser commented 7 years ago

This is very strange. How did you generate the data? I just re-ran and it's looking ok, strange. What's your python version and TF version?

pltrdy commented 7 years ago

Well, I tried again, pulled last commit (963730e32fe06f24e7534e550504a087d7b5591e), then python setup.py install

I'm using Python 3.6.1 and tensorflow 1.1.0 and I'm now getting:

Traceback (most recent call last):
  File "/home/pltrdy/.conda/envs/tensorflow/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.14', 't2t-trainer')
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 744, in run_script
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 1506, in run_script
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 67, in <module>
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 63, in main
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 266, in run
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 145, in experiment_fn
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 157, in create_experiment
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 195, in create_experiment_components
TypeError: __init__() got an unexpected keyword argument 'session_config'

at training ..

martinpopel commented 7 years ago

Try tensorflow 1.2.1

pltrdy commented 7 years ago

Yes, my bad, still, when I'm training then deconding I'm having the same kind of error

ValueError: The shape for foldl/while/Merge_1:0 is not an invariant for the loop. It enters the loop with shape (32, 78, 1, 1), but has shape (32, 79, 1, 1) after one iteration. Provide shape invariants using either the `shape_invariants` argument of tf.while_loop or set_shape() on the loop variables.

using this kind of training/decoding: https://gist.github.com/pltrdy/8d8ce9f4dbcf1793f992a7bab358b44d


Note that before having this one have to apply the following patch:

diff --git a/tensor2tensor/data_generators/problem_hparams.py b/tensor2tensor/data_generators/problem_hparams.py
index 70b9dad..4164eb4 100644
--- a/tensor2tensor/data_generators/problem_hparams.py
+++ b/tensor2tensor/data_generators/problem_hparams.py
@@ -371,6 +371,7 @@ def lmptb_10k(model_hparams):
   vocabulary = text_encoder.TokenTextEncoder(
       os.path.join(model_hparams.data_dir, "lmptb_10k.vocab"))
   p.vocabulary = {
+      "inputs": vocabulary,
       "targets": vocabulary,
   }
   p.input_space_id = 3

otherwise you get:

Traceback (most recent call last):
  File "/home/pltrdy/.conda/envs/tensorflow/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.14', 't2t-trainer')
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/pkg_resources/__init__.py", line 741, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1509, in run_script
    exec(script_code, namespace, namespace)
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 67, in <module>
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/EGG-INFO/scripts/t2t-trainer", line 63, in main
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 266, in run
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 575, in run_locally
  File "/home/pltrdy/.conda/envs/tensorflow/lib/python3.6/site-packages/tensor2tensor-1.0.14-py3.6.egg/tensor2tensor/utils/trainer_utils.py", line 647, in decode_from_file
KeyError: 'inputs'
rsepassi commented 6 years ago

Is this still a problem? We support TensorFlow 1.3 and the latest tensor2tensor is 1.2.4. Please reopen (and please provide Python, TensorFlow, and Tensor2Tensor versions as well as command-lines and outputs) if you are still having this issue as we've been unable to reproduce.

alexcdot commented 6 years ago

@lukaszkaiser "I just re-generated the set and re-trained a small model and I'm not seeing this..." Do you mind sharing what t2t-decoder command you used to generate the output?

2877992943 commented 6 years ago

image may I ask what is the inference input? Since in ptb problem self._has_input=False, then how to generate that result in the picture,thank you.