The training process in DAM (SNLI) does not really start, due to lookup operations to positions in the word embedding matrix that do not exist. For reproducing the error: python3 bin/jack-train.py with config='./conf/dam.yaml'.
I'm taking a chance for giving a look at the DAM code in jack and refreshing it.
PS: DAM requires an initial NULL token and a UNK token: are those supported by the current Vocab ?
$ python3 bin/jack-train.py with config='./conf/dam.yaml'
WARNING - jack - No observers have been added to this run
INFO - jack - Running command 'main'
INFO - jack - Started
INFO - jack-train.py - TRAINING
INFO - jack-train.py - loaded train/dev/test data
INFO - jack.io.embeddings.glove - Loading GloVe vectors ..
INFO - jack.io.embeddings.glove - Loading GloVe vectors completed.
INFO - jack-train.py - loaded pre-trained embeddings (data/GloVe/glove.840B.300d.txt)
INFO - jack-train.py - Time since last checkpoint : 1.7min
INFO - jack - Running command 'print_config'
INFO - jack - Started
Configuration (modified, added, typechanged, doc):
batch_size = 32
clip_value = 0.0
config = './conf/dam.yaml'
debug = False
debug_examples = 10
description = 'A configuration inheriting from the default jack.yaml\n'
dev = 'data/SNLI/snli_1.0/snli_1.0_dev.jsonl'
dev_batch_size = 128
dropout = 0.5
embedding_file = 'data/GloVe/glove.840B.300d.txt'
embedding_format = 'glove'
epochs = 400
experiments_db = './out/experiments.db'
l2 = 0.0
learning_rate = 0.001
learning_rate_decay = 0.5
loader = 'snli'
log_interval = 100
lowercase = True
model = 'dam_snli_reader'
model_dir = './dam_snli_reader'
name = None
normalize_pretrain = False
optimizer = 'adam'
output_dir = './out/'
parent_config = './conf/jack.yaml'
prune = False
repr_dim = 300
repr_dim_input = 300
seed = 1337
tensorboard_folder = None
test = None
train = 'data/SNLI/snli_1.0/snli_1.0_train.jsonl'
train_pretrain = False
validation_interval = None
vocab_from_embeddings = True
vocab_maxsize = 1000000000000
vocab_minfreq = 2
vocab_sep = True
with_char_embeddings = True
write_metrics_to = None
INFO - jack - Completed after 0:00:00
2017-10-24 22:49:30.559456: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 22:49:30.559480: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 22:49:30.559487: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on yourmachine and could speed up CPU computations.
2017-10-24 22:49:30.559493: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-24 22:49:30.559500: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on yourmachine and could speed up CPU computations.
INFO - jack-train.py - Time since last checkpoint : 0.0035min
INFO - jack.core.reader - Setting up data and model...
INFO - jack.readers.natural_language_inference.decomposable_attention - Building the Attend graph ..
INFO - jack.readers.natural_language_inference.decomposable_attention - Building the Compare graph ..
INFO - jack.readers.natural_language_inference.decomposable_attention - Building the Aggregate graph ..
INFO - jack.core.reader - Start training...
ERROR - jack - Failed after 0:06:09!
Traceback (most recent calls WITHOUT Sacred internals):
File "bin/jack-train.py", line 154, in main
jtrain(reader, train_data, test_data, dev_data, configuration, debug=debug)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/train_reader.py", line 94, in train
l2=l2, clip=clip_value, clip_op=tf.clip_by_value)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/core/reader.py", line 264, in train
current_loss, _ = self.session.run([loss, min_op], feed_dict=feed_dict)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[13,5] = 2196016 is not in [0, 2196015)
[[Node: dam_snli_reader/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@dam_snli_reader/emb_Q"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](dam_snli_reader/emb_Q/read, _arg_dam_snli_reader/question_0_1)]]
Caused by op 'dam_snli_reader/embedding_lookup', defined at:
File "bin/jack-train.py", line 60, in <module>
@ex.automain
File "/home/jack/workspace/jack/.eggs/sacred-0.7.1-py3.6.egg/sacred/experiment.py", line 131, in automain
self.run_commandline()
File "/home/jack/workspace/jack/.eggs/sacred-0.7.1-py3.6.egg/sacred/experiment.py", line 245, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/jack/workspace/jack/.eggs/sacred-0.7.1-py3.6.egg/sacred/experiment.py", line 189, in run
run()
File "/home/jack/workspace/jack/.eggs/sacred-0.7.1-py3.6.egg/sacred/run.py", line 229, in __call__
self.result = self.main_function(*args)
File "/home/jack/workspace/jack/.eggs/sacred-0.7.1-py3.6.egg/sacred/config/captured_function.py", line 47, in captured_function
result = wrapped(*args, **kwargs)
File "bin/jack-train.py", line 154, in main
jtrain(reader, train_data, test_data, dev_data, configuration, debug=debug)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/train_reader.py", line 94, in train
l2=l2, clip=clip_value, clip_op=tf.clip_by_value)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/core/reader.py", line 237, in train
self.setup_from_data(training_set, is_training=True)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/core/reader.py", line 141, in setup_from_data
self.model_module.setup(is_training)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/core/model_module.py", line 162, in setup
self.shared_resources, *[self._tensors[port] for port in self.input_ports])
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/readers/multiple_choice/shared.py", line 73, in create_output
shared_resources.config['answer_size'])
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/jack-0.1.0-py3.6.egg/jack/readers/natural_language_inference/decomposable_attention.py", line 24, in forward_pass
question_embedding = tf.nn.embedding_lookup(self.question_embedding_matrix, question)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 294, in embedding_lookup
transform_fn=None)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 123, in _embedding_lookup_and_transform
result = _gather_and_clip(params[0], ids, max_norm, name=name)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 57, in _gather_and_clip
embs = array_ops.gather(params, ids, name=name)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2409, in gather
validate_indices=validate_indices, name=name)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1219, in gather
validate_indices=validate_indices, name=name)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/jack/.pyenv/versions/3.6.3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): indices[13,5] = 2196016 is not in [0, 2196015)
[[Node: dam_snli_reader/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@dam_snli_reader/emb_Q"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](dam_snli_reader/emb_Q/read, _arg_dam_snli_reader/question_0_1)]]
The problem seems to arise when I add the vocab_from_embeddings: True flag - @TimDettmers thanks for the suggestion of ablating the options in the config file
The training process in DAM (SNLI) does not really start, due to lookup operations to positions in the word embedding matrix that do not exist. For reproducing the error:
python3 bin/jack-train.py with config='./conf/dam.yaml'
.I'm taking a chance for giving a look at the DAM code in
jack
and refreshing it.PS: DAM requires an initial NULL token and a UNK token: are those supported by the current
Vocab
?