I have been trying to use PointCNN for aerial lidar data segmentation, for which i had used sample_num as 12288 with batch size 4 successfully on 16GB V100 card. My versions were tensorflow 1.10.1 and cuda 9.2, i could compile the tf_compile then.
Now when i got by 32GB single V100 card i was trying to fit bigger batch size (12) with same sample_num, now i get this error as below: could you please suggest what i am missing?
However when i pass batch size of 4 the training starts well with same set-up.
My versions:
cuda toolkit 9.2
tensorflow 1.10.1
nvidia drivers 415.27
log of error:
E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2020-03-09 10:35:51.149051: E tensorflow/stream_executor/cuda/cuda_blas.cc:2510] Internal: failed BLAS call, see log for details
2020-03-09 10:35:51.149173: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657d7f89d0
2020-03-09 10:35:51.149236: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657d7f89f0
2020-03-09 10:35:51.153013: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655dff99d0
2020-03-09 10:35:51.153059: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655dff99f0
2020-03-09 10:35:51.153214: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657dff99d0
2020-03-09 10:35:51.153262: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657dff99f0
2020-03-09 10:35:51.153464: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655f7fc9d0
2020-03-09 10:35:51.153511: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655f7fc9f0
Traceback (most recent call last):
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[147456,12,12], b.shape=[147456,12,96], m=12, n=96, k=12, batch_size=147456
[[Node: xconv_1_fts_X = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](xconv_1_X_2_KK, xconv_1_nn_fts_input-0-1-TransposeNCHWToNHWC-LayoutOptimizer)]]
[[Node: metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_3/_485 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_44108...t/Switch_3", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 311, in
is_training: True,
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[147456,12,12], b.shape=[147456,12,96], m=12, n=96, k=12, batch_size=147456
[[Node: xconv_1_fts_X = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](xconv_1_X_2_KK, xconv_1_nn_fts_input-0-1-TransposeNCHWToNHWC-LayoutOptimizer)]]
[[Node: metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_3/_485 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_44108...t/Switch_3", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'xconv_1_fts_X', defined at:
File "train.py", line 132, in
net = model.Net(points_augmented, features_augmented, is_training, setting)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn_seg.py", line 11, in init
PointCNN.init(self, points, features, is_training, setting)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn.py", line 116, in init
depth_multiplier, sorting_method, with_global)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn.py", line 39, in xconv
fts_X = tf.matmul(X_2_KK, nn_fts_input, name=tag + 'fts_X')
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1980, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
Hi @burui11087, @yangyanli ,
I have been trying to use PointCNN for aerial lidar data segmentation, for which i had used sample_num as 12288 with batch size 4 successfully on 16GB V100 card. My versions were tensorflow 1.10.1 and cuda 9.2, i could compile the tf_compile then.
Now when i got by 32GB single V100 card i was trying to fit bigger batch size (12) with same sample_num, now i get this error as below: could you please suggest what i am missing?
However when i pass batch size of 4 the training starts well with same set-up.
My versions: cuda toolkit 9.2 tensorflow 1.10.1 nvidia drivers 415.27
log of error:
E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED 2020-03-09 10:35:51.149051: E tensorflow/stream_executor/cuda/cuda_blas.cc:2510] Internal: failed BLAS call, see log for details 2020-03-09 10:35:51.149173: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657d7f89d0 2020-03-09 10:35:51.149236: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657d7f89f0 2020-03-09 10:35:51.153013: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655dff99d0 2020-03-09 10:35:51.153059: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655dff99f0 2020-03-09 10:35:51.153214: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657dff99d0 2020-03-09 10:35:51.153262: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f657dff99f0 2020-03-09 10:35:51.153464: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655f7fc9d0 2020-03-09 10:35:51.153511: I tensorflow/stream_executor/stream.cc:4818] stream 0x5648deb70150 did not memzero GPU location; source: 0x7f655f7fc9f0 Traceback (most recent call last): File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[147456,12,12], b.shape=[147456,12,96], m=12, n=96, k=12, batch_size=147456 [[Node: xconv_1_fts_X = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](xconv_1_X_2_KK, xconv_1_nn_fts_input-0-1-TransposeNCHWToNHWC-LayoutOptimizer)]] [[Node: metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_3/_485 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_44108...t/Switch_3", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "train.py", line 311, in
is_training: True,
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[147456,12,12], b.shape=[147456,12,96], m=12, n=96, k=12, batch_size=147456
[[Node: xconv_1_fts_X = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](xconv_1_X_2_KK, xconv_1_nn_fts_input-0-1-TransposeNCHWToNHWC-LayoutOptimizer)]]
[[Node: metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_3/_485 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_44108...t/Switch_3", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'xconv_1_fts_X', defined at: File "train.py", line 132, in
net = model.Net(points_augmented, features_augmented, is_training, setting)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn_seg.py", line 11, in init
PointCNN.init(self, points, features, is_training, setting)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn.py", line 116, in init
depth_multiplier, sorting_method, with_global)
File "/home/sayak_cowi/notebooks/PointCNN/pointcnn.py", line 39, in xconv
fts_X = tf.matmul(X_2_KK, nn_fts_input, name=tag + 'fts_X')
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1980, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/sayak_cowi/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[147456,12,12], b.shape=[147456,12,96], m=12, n=96, k=12, batch_size=147456 [[Node: xconv_1_fts_X = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](xconv_1_X_2_KK, xconv_1_nn_fts_input-0-1-TransposeNCHWToNHWC-LayoutOptimizer)]] [[Node: metrics/accuracy/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_3/_485 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_44108...t/Switch_3", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]