Error with Concat Pooling during Prediction

seangtkelley commented 4 years ago

Run command

python predict.py -m ../resources/saved_models/rnn-2019-11-01-00-49-59_model_best.pkl.gz

Eventually, you will get the following error:

Evaluating language: python
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1156085/1156085 [00:09<00:00, 119014.61it/s]
1156085it [00:23, 48494.08it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Expected concatenating dimensions in the range [-1, 1), but got 1
     [[{{node query_encoder/rnn_encoder/concat_8}} = ConcatV2[N=3, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](query_encoder/rnn_encoder/Squeeze, query_encoder/rnn_encoder/Max_1, query_encoder/rnn_encoder/truediv, query_encoder/rnn_encoder/Max_1/reduction_indices)]]
     [[{{node query_encoder/rnn_encoder/concat_8/_535}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_587_query_encoder/rnn_encoder/concat_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "predict.py", line 129, in <module>
    for idx, _ in zip(*query_model(query, model, indices, language)):
  File "predict.py", line 69, in query_model
    'language': language}])[0]
  File "/home/dev/src/models/model.py", line 918, in get_query_representations
    representation_type=RepresentationType.QUERY)
  File "/home/dev/src/models/model.py", line 888, in __compute_representations_batched
    op_results = self.__sess.run(model_representation_op, feed_dict=batch_data_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Expected concatenating dimensions in the range [-1, 1), but got 1
     [[node query_encoder/rnn_encoder/concat_8 (defined at /home/dev/src/utils/tfutils.py:177)  = ConcatV2[N=3, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](query_encoder/rnn_encoder/Squeeze, query_encoder/rnn_encoder/Max_1, query_encoder/rnn_encoder/truediv, query_encoder/rnn_encoder/Max_1/reduction_indices)]]
     [[{{node query_encoder/rnn_encoder/concat_8/_535}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_587_query_encoder/rnn_encoder/concat_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'query_encoder/rnn_encoder/concat_8', defined at:
  File "predict.py", line 113, in <module>
    hyper_overrides={})
  File "/home/dev/src/model_restore_helper.py", line 36, in restore
    model.make_model(is_train=is_train)
  File "/home/dev/src/models/model.py", line 231, in make_model
    self._make_model(is_train=is_train)
  File "/home/dev/src/models/model.py", line 266, in _make_model
    self.ops['query_representations'] = self.__query_encoder.make_model(is_train=is_train)
  File "/home/dev/src/encoders/rnn_seq_encoder.py", line 182, in make_model
    sequence_token_masks=token_mask)
  File "/home/dev/src/utils/tfutils.py", line 177, in pool_sequence_embedding
    ] , axis=1)                                                                       # concat pool, B x 3*D (refer to note above about increased embedding size)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 1124, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1033, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): ConcatOp : Expected concatenating dimensions in the range [-1, 1), but got 1
     [[node query_encoder/rnn_encoder/concat_8 (defined at /home/dev/src/utils/tfutils.py:177)  = ConcatV2[N=3, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](query_encoder/rnn_encoder/Squeeze, query_encoder/rnn_encoder/Max_1, query_encoder/rnn_encoder/truediv, query_encoder/rnn_encoder/Max_1/reduction_indices)]]
     [[{{node query_encoder/rnn_encoder/concat_8/_535}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_587_query_encoder/rnn_encoder/concat_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

seangtkelley commented 4 years ago

Training shapes:

squeeze tf.shape Tensor("query_encoder/rnn_encoder/Shape_2:0", shape=(?,), dtype=int32)
squeeze get_shape <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/Squeeze_1:0' shape=<unknown> dtype=float32>>
max pool tf.shape Tensor("query_encoder/rnn_encoder/Shape_3:0", shape=(2,), dtype=int32)
max pool get_shape <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/Max:0' shape=(?, 128) dtype=float32>>
mean pool tf.shape:  Tensor("query_encoder/rnn_encoder/Shape_4:0", shape=(2,), dtype=int32)
mean pool get_shape:  <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/truediv:0' shape=(?, 128) dtype=float32>>

Prediction shapes:

squeeze tf.shape Tensor("query_encoder/rnn_encoder/Shape_2:0", shape=(?,), dtype=int32)
squeeze get_shape <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/Squeeze_1:0' shape=<unknown> dtype=float32>>
max pool tf.shape Tensor("query_encoder/rnn_encoder/Shape_3:0", shape=(2,), dtype=int32)
max pool get_shape <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/Max:0' shape=(?, 42) dtype=float32>>
mean pool tf.shape:  Tensor("query_encoder/rnn_encoder/Shape_4:0", shape=(2,), dtype=int32)
mean pool get_shape:  <bound method Tensor.get_shape of <tf.Tensor 'query_encoder/rnn_encoder/truediv:0' shape=(?, 42) dtype=float32>>

These six lines print out seven times for training and prediction.

seangtkelley commented 4 years ago

It has now morphed into this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 558, in set_shape
    unknown_shape)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes must be equal rank, but are 2 and 3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py", line 841, in _GradientsHelper
    in_grad.set_shape(t_in.get_shape())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 561, in set_shape
    raise ValueError(str(e))

wandb: Waiting for W&B process to finish, PID 109
ValueError: Shapes must be equal rank, but are 2 and 3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 191, in <module>
    run_and_debug(lambda: run(args), args['--debug'])
  File "/usr/local/lib/python3.6/dist-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "train.py", line 191, in <lambda>
    run_and_debug(lambda: run(args), args['--debug'])
  File "train.py", line 177, in run
    parallelize=not(arguments['--sequential']))
  File "train.py", line 72, in run_train
    model.make_model(is_train=True)
  File "/home/dev/src/models/model.py", line 234, in make_model
    self._make_training_step()
  File "/home/dev/src/models/model.py", line 378, in _make_training_step
    gradients = tf.gradients(self.ops['loss'], trainable_vars)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py", line 848, in _GradientsHelper
    (op.name, i, t_in.shape, in_grad.shape))
ValueError: Incompatible shapes between op input and calculated input gradient.  Forward operation: query_encoder/nbow_encoder/cond/Merge.  Input index: 1. Original input shape: (1, ?, 128).  Calculated input gradient shape: (?, ?)

sjakati98 / CodeSearchNet

Error with Concat Pooling during Prediction #1