Open KyonP opened 7 years ago
BTW, my tf version is 1.1, cuda 8.0, ubuntu 14.04
same issue! TF 1.1, cuda 8.0, ubuntu 16.04
@KyonP did you try running the unit tests?
python -m unittest seq2seq.test.pipeline_test
I realized a few problems after running them. it might be useful to check!
I've passed the test, thx. but I still can't figure out the problem.
Still suffering. Anyone else has this issue?
Try replacing ./example_configs/nmt_small.yml
with ./example_configs/nmt_medium.yml,
or ./example_configs/nmt_large.yml,
.
./example_configs/nmt_medium.yml" --model_params "
vocab_source: $VOCAB_SOURCE
vocab_target: $VOCAB_TARGET" --input_pipeline_train "
class: ParallelTextInputPipeline
params:
source_files:
- $TRAIN_SOURCES
target_files:
- $TRAIN_TARGETS" --input_pipeline_dev "
class: ParallelTextInputPipeline
params:
source_files:
- $DEV_SOURCES
target_files:
- $DEV_TARGETS" --batch_size 32 --train_steps 1000 --output_dir $MODEL_DIR
INFO:tensorflow:Loading config from /root/vchatbot/reference_code/seq2seq/example_configs/nmt_medium.yml INFO:tensorflow:Final Config: model: AttentionSeq2Seq model_params: attention.class: seq2seq.decoders.attention.AttentionLayerBahdanau attention.params: {num_units: 256} bridge.class: seq2seq.models.bridges.ZeroBridge decoder.class: seq2seq.decoders.AttentionDecoder decoder.params: rnn_cell: cell_class: GRUCell cell_params: {num_units: 256} dropout_input_keep_prob: 0.8 dropout_output_keep_prob: 1.0
rnn_cell:
cell_class: GRUCell
cell_params: {num_units: 256}
dropout_input_keep_prob: 0.8
dropout_output_keep_prob: 1.0
num_layers: 2
embedding.dim: 256 encoder.class: seq2seq.encoders.BidirectionalRNNEncoder encoder.params: rnn_cell: cell_class: GRUCell cell_params: {num_units: 256} dropout_input_keep_prob: 0.8 dropout_output_keep_prob: 1.0 num_layers: 1 optimizer.learning_rate: 0.0001 optimizer.name: Adam optimizer.params: {epsilon: 8.0e-07} source.max_seq_len: 50 source.reverse: false target.max_seq_len: 50
INFO:tensorflow:Setting save_checkpoints_secs to 600 INFO:tensorflow:Creating ParallelTextInputPipeline in mode=train INFO:tensorflow: ParallelTextInputPipeline: !!python/unicode 'num_epochs': null !!python/unicode 'shuffle': true !!python/unicode 'source_delimiter': !!python/unicode ' ' !!python/unicode 'source_files': [null] !!python/unicode 'target_delimiter': !!python/unicode ' ' !!python/unicode 'target_files': [null]
INFO:tensorflow:Creating ParallelTextInputPipeline in mode=eval INFO:tensorflow: ParallelTextInputPipeline: !!python/unicode 'num_epochs': 1 !!python/unicode 'shuffle': false !!python/unicode 'source_delimiter': !!python/unicode ' ' !!python/unicode 'source_files': [null] !!python/unicode 'target_delimiter': !!python/unicode ' ' !!python/unicode 'target_files': [null]
INFO:tensorflow:Using config: {'_model_dir': None, '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3d0e737dd0>, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 4, '_master': ''}
INFO:tensorflow:Training model for 1000 steps
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/vchatbot/reference_code/seq2seq/bin/train.py", line 277, in
replacing argument did not work.
I've tried downgrade tensorflow version to 1.0. didn't work either.
Another potential 'fix' could be trying Python 3 instead.
@KyonP , did you solve this problem? I have the same issue!
I get a very similar problem!
Python 3.5.2+ Running Linux 4.8.0-59-generic #64-Ubuntu SMP Thu Jun 29 19:38:34 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Tensorflow 1.2.1 Nvidia drivers 375.39 Cuda compilation tools, release 8.0, V8.0.44
I manage to train a model using:
export VOCAB_SOURCE=${HOME}/nmt_data/toy_reverse/train/vocab.sources.txt
export VOCAB_TARGET=${HOME}/nmt_data/toy_reverse/train/vocab.targets.txt
export TRAIN_SOURCES=${HOME}/nmt_data/toy_reverse/train/sources.txt
export TRAIN_TARGETS=${HOME}/nmt_data/toy_reverse/train/targets.txt
export DEV_SOURCES=${HOME}/nmt_data/toy_reverse/dev/sources.txt
export DEV_TARGETS=${HOME}/nmt_data/toy_reverse/dev/targets.txt
export DEV_TARGETS_REF=${HOME}/nmt_data/toy_reverse/dev/targets.txt
export TRAIN_STEPS=5000
export MODEL_DIR=${TMPDIR:-/tmp}/nmt_tutorial
mkdir -p $MODEL_DIR
python3 -m bin.train \
--config_paths="
./example_configs/nmt_medium.yml,
./example_configs/train_seq2seq.yml,
./example_configs/text_metrics_bpe.yml" \
--model_params "
vocab_source: $VOCAB_SOURCE
vocab_target: $VOCAB_TARGET" \
--input_pipeline_train "
class: ParallelTextInputPipeline
params:
source_files:
- $TRAIN_SOURCES
target_files:
- $TRAIN_TARGETS" \
--input_pipeline_dev "
class: ParallelTextInputPipeline
params:
source_files:
- $DEV_SOURCES
target_files:
- $DEV_TARGETS" \
--batch_size 32 \
--train_steps $TRAIN_STEPS \
--output_dir $MODEL_DIR
(Note the usage of medium model and not small as tutorial suggest)
This seems to work fine and all is well!
Then I try to run the infer command:
export PRED_DIR=${MODEL_DIR}/pred
mkdir -p ${PRED_DIR}
python3 -m bin.infer \
--tasks "
- class: DecodeText" \
--model_dir $MODEL_DIR \
--input_pipeline "
class: ParallelTextInputPipeline
params:
source_files:
- $DEV_SOURCES" \
> ${PRED_DIR}/predictions.txt
But this fails with the following output:
INFO:tensorflow:Creating DecodeText in mode=infer
INFO:tensorflow:
DecodeText: {delimiter: ' ', postproc_fn: '', unk_mapping: null, unk_replace: false}
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/iceman/tf/jokes2/seq2seq/bin/infer.py", line 129, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/iceman/tf/jokes2/seq2seq/bin/infer.py", line 106, in main
batch_size=FLAGS.batch_size)
File "/home/iceman/tf/jokes2/seq2seq/seq2seq/inference/inference.py", line 53, in create_inference_graph
features, labels = input_fn()
File "/home/iceman/tf/jokes2/seq2seq/seq2seq/training/utils.py", line 260, in input_fn
data_provider = pipeline.make_data_provider()
File "/home/iceman/tf/jokes2/seq2seq/seq2seq/data/input_pipeline.py", line 182, in make_data_provider
**kwargs)
File "/home/iceman/tf/jokes2/seq2seq/seq2seq/data/parallel_data_provider.py", line 127, in __init__
seed=seed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 210, in parallel_read
data_files = get_data_files(data_sources)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 276, in get_data_files
data_files += get_data_files(source)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 278, in get_data_files
if '*' in data_sources or '?' in data_sources or '[' in data_sources:
TypeError: argument of type 'NoneType' is not iterable
I'm able to run the tests: python3 -m unittest seq2seq.test.pipeline_test
and get a
Ran 2 tests in 14.168s
OK
in the end.
Hm, seems like I for some reason had multiple saved checkpoints in my model dir, so I cleared the folder and retrained the model fresh, then the infer command worked fine!
i got it working by setting all the environment variables in the same bash script. The ones set in the data script were being lost when the script terminated, so the training script wasn't getting them (hence data_sources being empty and not iterable) https://stackoverflow.com/questions/1464253/global-environment-variables-in-a-shell-script
I'm following the tutorial (https://google.github.io/seq2seq/nmt/), but can't run the code.
Actually, I'm not understanding the code explained in the tutorial. Do I have to type all those arguments in the command line?
I've set environment variables, and then copy-and-pasted the code in Training step.
And I got following lousy error.
~/reference_code/seq2seq#` python -m bin.train \
WARNING:tensorflow:Ignoring config flag: default_params INFO:tensorflow:Setting save_checkpoints_secs to 600 INFO:tensorflow:Creating ParallelTextInputPipeline in mode=train INFO:tensorflow: ParallelTextInputPipeline: !!python/unicode 'num_epochs': null !!python/unicode 'shuffle': true !!python/unicode 'source_delimiter': !!python/unicode ' ' !!python/unicode 'source_files': [null] !!python/unicode 'target_delimiter': !!python/unicode ' ' !!python/unicode 'target_files': [null]
INFO:tensorflow:Creating ParallelTextInputPipeline in mode=eval INFO:tensorflow: ParallelTextInputPipeline: !!python/unicode 'num_epochs': 1 !!python/unicode 'shuffle': false !!python/unicode 'source_delimiter': !!python/unicode ' ' !!python/unicode 'source_files': [null] !!python/unicode 'target_delimiter': !!python/unicode ' ' !!python/unicode 'target_files': [null]
INFO:tensorflow:Using config: {'_model_dir': None, '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9fcaef7cd0>, '_tf_config': gpu_options { per_process_gpu_memory_fraction: 1.0 } , '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 4, '_master': ''} INFO:tensorflow:Creating PrintModelAnalysisHook in mode=train INFO:tensorflow: PrintModelAnalysisHook: {}
INFO:tensorflow:Creating MetadataCaptureHook in mode=train INFO:tensorflow: MetadataCaptureHook: {!!python/unicode 'step': 10}
INFO:tensorflow:Creating SyncReplicasOptimizerHook in mode=train INFO:tensorflow: SyncReplicasOptimizerHook: {}
INFO:tensorflow:Creating TrainSampleHook in mode=train INFO:tensorflow: TrainSampleHook: {!!python/unicode 'every_n_secs': null, !!python/unicode 'every_n_steps': 1000, !!python/unicode 'source_delimiter': !!python/unicode ' ', !!python/unicode 'target_delimiter': !!python/unicode ' '}
INFO:tensorflow:Creating LogPerplexityMetricSpec in mode=eval INFO:tensorflow: LogPerplexityMetricSpec: {}
INFO:tensorflow:Creating BleuMetricSpec in mode=eval INFO:tensorflow: BleuMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/f_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/r_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_1/p_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/f_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/r_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_2/p_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Creating RougeMetricSpec in mode=eval INFO:tensorflow: RougeMetricSpec: {!!python/unicode 'eos_token': !!python/unicode 'SEQUENCE_END', !!python/unicode 'postproc_fn': !!python/unicode 'seq2seq.data.postproc.strip_bpe', !!python/unicode 'rouge_type': !!python/unicode 'rouge_l/f_score', !!python/unicode 'separator': !!python/unicode ' ', !!python/unicode 'sos_token': !!python/unicode 'SEQUENCE_START'}
INFO:tensorflow:Training model for 1000 steps Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/vchatbot/reference_code/seq2seq/bin/train.py", line 277, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/root/vchatbot/reference_code/seq2seq/bin/train.py", line 272, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 111, in run
return _execute_schedule(experiment, schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule
return task()
File "seq2seq/contrib/experiment.py", line 104, in continuous_train_and_eval
monitors=self._train_monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 281, in new_func
return func(args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 430, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 925, in _train_model
features, labels = input_fn()
File "seq2seq/training/utils.py", line 260, in input_fn
data_provider = pipeline.make_data_provider()
File "seq2seq/data/input_pipeline.py", line 180, in make_data_provider
kwargs)
File "seq2seq/data/parallel_data_provider.py", line 125, in init
seed=seed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 210, in parallel_read
data_files = get_data_files(data_sources)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 276, in get_data_files
data_files += get_data_files(source)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/data/parallel_reader.py", line 278, in get_data_files
if '' in data_sources or '?' in data_sources or '[' in data_sources:
TypeError: argument of type 'NoneType' is not iterable