How to change batch size ?

hiwasawa0715 commented 5 years ago

I tried to run "train.py"(process 5 train at "usage"). But, I get the following "ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape" error.

Shall I change the batch size to be small? How to change batch size ?

[my environment] gpu Geforce1050 ti memory 16 GiB swap 16 GiB ubuntu 16.04 cuda 9.0 cudnn 7.5.0 [anaconda3] python 3.6 tensorflow-gpu 1.12.0 scikit-learn 0.21.3 open3d-python 0.7.0.0

2019-08-25 14:54:06.964697: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *x****x***** 2019-08-25 14:54:06.964735: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_ops.cc:446 : Resource exhausted: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node fa_layer4/conv_2/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 469, in train() File "train.py", line 437, in train train_one_epoch(sess, ops, train_writer, stack_train) File "train.py", line 243, in train_one_epoch feed_dict=feed_dict, File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node fa_layer4/conv_2/Conv2D (defined at /home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py:186) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'fa_layer4/conv_2/Conv2D', defined at: File "train.py", line 469, in train() File "train.py", line 359, in train bn_decay=bn_decay, File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/model.py", line 128, in get_model scope="fa_layer4", File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/pointnet_util.py", line 323, in pointnet_fp_module bn_decay=bn_decay, File "/home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py", line 186, in conv2d data_format=data_format, File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d data_format=data_format, dilations=dilations, name=name) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/home/hiwasawa/anaconda3/envs/pointNet2_py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,128,8192,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node fa_layer4/conv_2/Conv2D (defined at /home/hiwasawa/PointNet2/Open3D-PointNet2-Semantic3D-master/util/tf_util.py:186) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fa_layer4/conv_1/Relu, fa_layer4/conv_2/weights/read/_211)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad/_433}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4838_gradients/layer2/conv2/BiasAdd_grad/BiasAddGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

windcatcher commented 4 years ago

me too

windcatcher commented 4 years ago

my deviceis gtx 1050. i soled it by change the batch size to 1 ,the param is in the file of semantic.json

isl-org / Open3D-PointNet2-Semantic3D

How to change batch size ? #55