Closed takeawayls closed 3 years ago
It is not a error but a piece of notification.
[2021-01-15 06:29:38] start training
Current epoch num: 1
2021-01-15 06:30:05.498867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 Traceback (most recent call last): File "./run_dnn.py", line 911, in
train(wnd_conf, args['model_ckpt']) File "./run_dnn.py", line 325, in train train_order_recall_op, train_order_auc_op]) File "/root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [2048,783] vs. shape[1] = [0,32] [[node DnnModel_3/embedding_trans/concat_18 (defined at /root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] (1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [2048,783] vs. shape[1] = [0,32] [[node DnnModel_3/embedding_trans/concat_18 (defined at /root/anaconda3/envs/myconda/lib/python2.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[hash_table_Lookup_29/SelectV2/_2089]] 0 successful operations. 3 derived errors ignored.
Original stack trace for u'DnnModel_3/embedding_trans/concat_18':
File "./run_dnn.py", line 911, in
how can I fix the error??
It is not a error but a piece of notification.
I get the same error. It is an error and the training process stop and exit
[2021-10-13 00:29:33] start training
Current epoch num: 1
Traceback (most recent call last): File "/opt/conda/envs/Python3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/opt/conda/envs/Python3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/opt/conda/envs/Python3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [204,783] vs. shape[1] = [0,32] [[{{node DnnModel/embedding_trans/concat_18}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](DnnModel/embedding_trans/concat_17, DnnModel/embedding_trans/embedding_lookup_sparse_11, DnnModel/gradients/DnnModel/concat_2_grad/mod)]] [[{{node Mean_107/_1559}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7123_Mean_107", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./run_dnn.py", line 912, in
Caused by op 'DnnModel/embedding_trans/concat_18', defined at:
File "./run_dnn.py", line 912, in
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [204,783] vs. shape[1] = [0,32] [[node DnnModel/embedding_trans/concat_18 (defined at /notebook/dmtfq/CIKM2020_DMT/DMT_code/model/net/base.py:124) = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](DnnModel/embedding_trans/concat_17, DnnModel/embedding_trans/embedding_lookup_sparse_11, DnnModel/gradients/DnnModel/concat_2_grad/mod)]] [[{{node Mean_107/_1559}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7123_Mean_107", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
run duration 193 s
nohup: redirecting stderr to stdout when I was running the model, show the error above