Closed hookover closed 6 years ago
显存不够,减少batch size
@ilovin 应该不是显存不够,显存有6,只使用了100多M,今天我又运行,错误不一样了: 应该是环境没装好,但网上的方法基本都试了一遍,还没找到解决方案
W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6396 get requests, put_count=3511 evicted_count=1000 eviction_rate=0.284819 and unsatisfied allocation rate=0.623046
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1022, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
[[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./lstm/train_net.py", line 89, in <module>
restore=bool(int(args.restore)))
File "./lstm/../lib/lstm/train.py", line 190, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./lstm/../lib/lstm/train.py", line 148, in train_model
ctc_loss,summary_str, _ = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
[[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'WarpCTC', defined at:
File "./lstm/train_net.py", line 89, in <module>
restore=bool(int(args.restore)))
File "./lstm/../lib/lstm/train.py", line 190, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./lstm/../lib/lstm/train.py", line 79, in train_model
loss, dense_decoded = self.net.build_loss()
File "./lstm/../lib/networks/network.py", line 637, in build_loss
label_lengths=label_len,input_lengths=time_step_batch)
File "/usr/local/lib/python3.5/dist-packages/warpctc_tensorflow-0.1-py3.5-linux-x86_64.egg/warpctc_tensorflow/__init__.py", line 43, in ctc
input_lengths, blank_label)
File "<string>", line 45, in warp_ctc
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): warp_ctc error in compute_ctc_loss: execution failed
[[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
[[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
@hookover 你那边的问题解决了么,我也遇到了类似的错误了.
@huiyang865 没解决,没有继续测试他这个了
@huiyang865 如果你有解决麻烦告知
my environment ubuntu 14.04 cuda 8.0 cudnn 6.0 tf 1.3.0 python 3.5 warpctc master
I'm closing this issue because it has been inactive for more than one month
现在的环境:
ubuntu16.04 cuda7.5 cudnn5 tensorflow1.0.1 gtx1060 内存16G
关于环境,这份代码可以运行在tensorflow1.4.0和cuda8.0以上吗?
错误如下