tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
555 stars 164 forks source link

InvalidArgumentError in sess.run() #112

Closed sabbau1u closed 2 years ago

sabbau1u commented 2 years ago

Hello! I have been trying to train off of data created with our own extractor and keep running into an InvalidArgumentError during self.sess.run() in the evaluate() of model.py. I have tried solutions from similar issues submitted but have been unable to solve my error. Thank you!

2021-12-10 19:11:02.823510: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Dictionaries loaded. Loaded subtoken vocab. size: 382 Loaded target word vocab. size: 255 Loaded nodes vocab. size: 190 Created model Starting training WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead. Training batch size: 512 Dataset path: ./SparqlExtractor/data/data Training file path: ./SparqlExtractor/data/data.train.c2s Validation path: ./SparqlExtractor/data/data.val.c2s Taking max contexts from each example: 5000 Random path sampling: True Embedding size: 128 Using BiLSTMs, each of size: 128 Decoder size: 320 Decoder layers: 1 Max path lengths: 13 Max subtokens in a token: 25 Max target length: 30 Embeddings dropout keep_prob: 0.75 LSTM dropout keep_prob: 0.5

Number of trainable params: 1905984 Initalized variables Started reader... Finished 1 epochs 2021-12-10 19:11:22.832338: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[309] = [102,3] is out of bounds: need 0 <= index < [5000,3] 2021-12-10 19:11:23.253771: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[411] = [136,3] is out of bounds: need 0 <= index < [5000,3] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence [[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,5000,13], [?,5000], [?,5000,25], [?,5000], [?,5000,1], [?,5000,1], [?,5000,25], [?,5000], [?,5000,1], [?,?], [?], [?], [?,5000]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/code2seq/model.py", line 96, in train _, batch_loss = self.sess.run([optimizer, train_loss]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence [[node IteratorGetNext (defined at /code2seq/reader.py:192) = IteratorGetNextoutput_shapes=[[?,5000,13], [?,5000], [?,5000,25], [?,5000], [?,5000,1], [?,5000,1], [?,5000,25], [?,5000], [?,5000,1], [?,?], [?], [?], [?,5000]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'IteratorGetNext', defined at: File "code2seq.py", line 39, in model.train() File "/code2seq/model.py", line 77, in train config=self.config) File "/code2seq/reader.py", line 43, in init self.output_tensors = self.compute_output() File "/code2seq/reader.py", line 192, in compute_output return self.iterator.get_next() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 421, in get_next name=name)), self._output_types, File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next output_shapes=output_shapes, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): End of sequence [[node IteratorGetNext (defined at /code2seq/reader.py:192) = IteratorGetNextoutput_shapes=[[?,5000,13], [?,5000], [?,5000,25], [?,5000], [?,5000,1], [?,5000,1], [?,5000,25], [?,5000], [?,5000,1], [?,?], [?], [?], [?,5000]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[309] = [102,3] is out of bounds: need 0 <= index < [5000,3] [[{{node SparseToDense}} = SparseToDense[T=DT_STRING, Tindices=DT_INT64, validate_indices=true](StringSplit, SparseTensor_1/dense_shape, StringSplit:1, NotEqual/y)]] [[{{node IteratorGetNext_1}} = IteratorGetNextoutput_shapes=[[?,5000,13], [?,5000], [?,5000,25], [?,5000], [?,5000,1], [?,5000,1], [?,5000,25], [?,5000], [?,5000,1], [?,?], [?], [?], [?,5000]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "code2seq.py", line 39, in model.train() File "/code2seq/model.py", line 108, in train results, precision, recall, f1, rouge = self.evaluate() File "/code2seq/model.py", line 185, in evaluate [self.eval_predicted_indices_op, self.eval_true_target_strings_op, self.eval_topk_values], File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[309] = [102,3] is out of bounds: need 0 <= index < [5000,3] [[{{node SparseToDense}} = SparseToDense[T=DT_STRING, Tindices=DT_INT64, validate_indices=true](StringSplit, SparseTensor_1/dense_shape, StringSplit:1, NotEqual/y)]] [[node IteratorGetNext_1 (defined at /code2seq/reader.py:192) = IteratorGetNextoutput_shapes=[[?,5000,13], [?,5000], [?,5000,25], [?,5000], [?,5000,1], [?,5000,1], [?,5000,25], [?,5000], [?,5000,1], [?,?], [?], [?], [?,5000]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

urialon commented 2 years ago

Hi @sabbau1 , Thank you for your interest in code2seq!

I suspect that your data has fields that contain characters that the reader uses as delimiters. That is, that some of the tokens contain commas (,). This makes every context have more than 3 fields when splitting by ,.

Can you check if this is indeed the case? You can check if when you string.split(',') every context, you are getting contexts that have more than 3 fields.

Let me know how it goes. Uri

kigero commented 2 years ago

I'm working in the same project as @sabbau1, and it took a while due to holidays, but I can confirm that this was the issue for us. This issue can be closed. Thanks!

urialon commented 2 years ago

Great to hear!