sjakati98 / CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
https://arxiv.org/abs/1909.09436
MIT License
2 stars 0 forks source link

Error with Attention during training #7

Open NeoMax97 opened 4 years ago

NeoMax97 commented 4 years ago

Here is the issue I'm getting when running attention with the following command:

python train.py --model rnn --testrun --hypers-override "{ \"code_seq_embedding_size\": 256, \"code_rnn_do_attention\": true, \"query_seq_embedding_size\": 256, \"query_rnn_do_attention\": true, \"batch_size\":500 }"

Begin Training.
Training on 523712 php, 454451 java, 317832 go, 412178 python, 123889 javascript, 48791 ruby samples.
Validating on 2209 ruby, 8253 javascript, 23107 python, 26015 php, 14242 go, 15328 java samples.
==== Epoch 0 ====
2019-12-10 04:25:45.723825: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at transpose_op.cc:157 : Invalid argument: transpose expects a vector of size 2. But input(1) is a vector of size 3
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: transpose expects a vector of size 2. But input(1) is a vector of size 3
     [[{{node code_encoder/ruby/rnn_encoder/transpose}} = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](code_encoder/ruby/rnn_encoder/Squeeze, gradients/code_encoder/go/rnn_encoder/bidirectional_rnn/fw/fw/transpose_grad/InvertPermutation)]]
     [[{{node code_encoder/javascript/rnn_encoder/map/while/LoopCond/_1177}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24351_code_encoder/javascript/rnn_encoder/map/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopcode_encoder/javascript/rnn_encoder/map/whi
wandb: Waiting for W&B process to finish, PID 968
le/TensorArrayReadV3/_440)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 191, in <module>
    run_and_debug(lambda: run(args), args['--debug'])
  File "/usr/local/lib/python3.6/dist-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "train.py", line 191, in <lambda>
    run_and_debug(lambda: run(args), args['--debug'])
  File "train.py", line 177, in run
    parallelize=not(arguments['--sequential']))
  File "train.py", line 89, in run_train
    model_path = model.train(train_data, valid_data, azure_info_path, quiet=quiet, resume=resume)
  File "/home/dev/src/models/model.py", line 787, in train
    quiet=quiet)
  File "/home/dev/src/models/model.py", line 725, in __run_epoch_in_batches
    op_results = self.__sess.run(ops_to_run, feed_dict=batch_data_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: transpose expects a vector of size 2. But input(1) is a vector of size 3
     [[node code_encoder/ruby/rnn_encoder/transpose (defined at /home/dev/src/encoders/rnn_seq_encoder.py:192)  = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](code_encoder/ruby/rnn_encoder/Squeeze, gradients/code_encoder/go/rnn_encoder/bidirectional_rnn/fw/fw/transpose_grad/InvertPermutation)]]
     [[{{node code_encoder/javascript/rnn_encoder/map/while/LoopCond/_1177}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24351_code_encoder/javascript/rnn_encoder/map/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopcode_encoder/javascript/rnn_encoder/map/while/TensorArrayReadV3/_440)]]

Caused by op 'code_encoder/ruby/rnn_encoder/transpose', defined at:
  File "train.py", line 191, in <module>
    run_and_debug(lambda: run(args), args['--debug'])
  File "/usr/local/lib/python3.6/dist-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "train.py", line 191, in <lambda>
    run_and_debug(lambda: run(args), args['--debug'])
  File "train.py", line 177, in run
    parallelize=not(arguments['--sequential']))
  File "train.py", line 72, in run_train
    model.make_model(is_train=True)
  File "/home/dev/src/models/model.py", line 231, in make_model
    self._make_model(is_train=is_train)
  File "/home/dev/src/mowandb: Program failed with code 1. Press ctrl-c to abort syncing.
dels/model.py", line 260, in _make_model
    language_encoders.append(self.__code_encoders[language].make_model(is_train=is_train))
  File "/home/dev/src/encoders/rnn_seq_encoder.py", line 192, in make_model
    context = tf.transpose(context, perm=[1, 0, 2])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 1420, in transpose
    ret = transpose_fn(a, perm, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 8927, in transpose
    "Transpose", x=x, perm=perm, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): transpose expects a vector of size 2. But input(1) is a vector of size 3
     [[node code_encoder/ruby/rnn_encoder/transpose (defined at /home/dev/src/encoders/rnn_seq_encoder.py:192)  = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](code_encoder/ruby/rnn_encoder/Squeeze, gradients/code_encoder/go/rnn_encoder/bidirectional_rnn/fw/fw/transpose_grad/InvertPermutation)]]
     [[{{node code_encoder/javascript/rnn_encoder/map/while/LoopCond/_1177}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24351_code_encoder/javascript/rnn_encoder/map/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopcode_encoder/javascript/rnn_encoder/map/while/TensorArrayReadV3/_440)]]

wandb: Run summary:
wandb:     _runtime 21864.232977628708
wandb:        _step 25
wandb:   train-loss 3.4117844104766846
wandb:   _timestamp 1575951647.108848
wandb: Syncing 9 W&B file(s) and 0 media file(s)
wandb:                                                                                
wandb: Synced rnn-2019-12-09-22-16-24: https://app.wandb.ai/neomax97/CodeSearchNet/runs/uyce6lxy