thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Resource exhausted: OOM when allocating tensor with shape[16,608,608,32] #1110

Open ankitAMD opened 4 years ago

ankitAMD commented 4 years ago

I am using Ubuntu 16.04 on VMware . I have do custom image detection on class solar_images. I alreaday installed tensorflow (required version tensorflow for this project). I carefully Installed OPencv, Anaconda jupyter. I run this project creating environment. But executing below command for training .....

    python flow  --model cfg/yolo-1c.cfg --load bin/yolo.weights --train --annotation new_model_data/annotations --dataset new_model_data/images --epoch 400

i get this issue ................................................

/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:15: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:16: The name tf.train.AdadeltaOptimizer is deprecated. Please use tf.compat.v1.train.AdadeltaOptimizer instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:17: The name tf.train.AdagradOptimizer is deprecated. Please use tf.compat.v1.train.AdagradOptimizer instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:18: The name tf.train.AdagradDAOptimizer is deprecated. Please use tf.compat.v1.train.AdagradDAOptimizer instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:19: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

Parsing ./cfg/yolo.cfg Parsing cfg/yolo-1c.cfg Loading bin/yolo.weights ... Successfully identified 203934260 bytes Finished in 0.03434300422668457s

Building net ... WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:105: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

Source | Train? | Layer description | Output size -------+--------+----------------------------------+--------------- WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/ops/baseop.py:70: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/ops/baseop.py:71: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/ops/baseop.py:84: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

   |        | input                            | (?, 608, 608, 3)

Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 608, 608, 32) WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/ops/simple.py:106: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

Load | Yep! | maxp 2x2p0_2 | (?, 304, 304, 32) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 304, 304, 64) Load | Yep! | maxp 2x2p0_2 | (?, 152, 152, 64) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 152, 152, 128) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 152, 152, 64) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 152, 152, 128) Load | Yep! | maxp 2x2p0_2 | (?, 76, 76, 128) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 76, 76, 256) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 76, 76, 128) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 76, 76, 256) Load | Yep! | maxp 2x2p0_2 | (?, 38, 38, 256) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 38, 38, 512) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 38, 38, 256) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 38, 38, 512) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 38, 38, 256) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 38, 38, 512) Load | Yep! | maxp 2x2p0_2 | (?, 19, 19, 512) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 19, 19, 512) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 19, 19, 512) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Load | Yep! | concat [16] | (?, 38, 38, 512) Load | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 38, 38, 64) WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/ops/convolution.py:28: calling extract_image_patches (from tensorflow.python.ops.array_ops) with ksizes is deprecated and will be removed in a future version. Instructions for updating: ksizes is deprecated, use sizes instead Load | Yep! | local flatten 2x2 | (?, 19, 19, 256) Load | Yep! | concat [27, 24] | (?, 19, 19, 1280) Load | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 19, 19, 1024) Init | Yep! | conv 1x1p0_1 linear | (?, 19, 19, 30) -------+--------+----------------------------------+--------------- Running entirely on CPU cfg/yolo-1c.cfg loss hyper-parameters: H = 19 W = 19 box = 5 classes = 1 scales = [1.0, 5.0, 1.0, 1.0] WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/yolov2/train.py:87: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Building cfg/yolo-1c.cfg loss WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/yolov2/train.py:107: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

Building cfg/yolo-1c.cfg train op WARNING:tensorflow:From /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/training/rmsprop.py:119: calling Ones.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py:145: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-12-23 05:52:06.111189: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-12-23 05:52:06.330850: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2112000000 Hz 2019-12-23 05:52:06.331826: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560e79bac2a0 executing computations on platform Host. Devices: 2019-12-23 05:52:06.331904: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-12-23 05:52:08.339421: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. Finished in 12.349433422088623s

Enter training ...

cfg/yolo-1c.cfg parsing new_model_data/annotations Parsing for ['solar_panel'] [====================>]100% D1.xml Statistics: solar_panel: 57 Dataset size: 30 Dataset of 30 instance(s) Training statistics: Learning rate : 1e-05 Batch size : 16 Epoch number : 400 Backup every : 2000

2019-12-23 05:52:24.396157: W tensorflow/core/framework/allocator.cc:107] Allocation of 757071872 exceeds 10% of system memory. 2019-12-23 05:52:32.381390: W tensorflow/core/framework/allocator.cc:107] Allocation of 757071872 exceeds 10% of system memory. 2019-12-23 05:52:32.464676: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at constant_op.cc:172 : Resource exhausted: OOM when allocating tensor with shape[16,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu 2019-12-23 05:52:32.691483: W tensorflow/core/framework/allocator.cc:107] Allocation of 757071872 exceeds 10% of system memory. 2019-12-23 05:52:32.692737: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at fused_batch_norm_op.cc:806 : Resource exhausted: OOM when allocating tensor with shape[16,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu Traceback (most recent call last): File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node gradients/zeros_128}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "flow", line 6, in cliHandler(sys.argv) File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/cli.py", line 33, in cliHandler print('Enter training ...'); tfnet.train() File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/flow.py", line 56, in train fetched = self.sess.run(fetches, feed_dict) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[node gradients/zeros_128 (defined at /home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/help.py:18) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Original stack trace for 'gradients/zeros_128': File "flow", line 6, in cliHandler(sys.argv) File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/cli.py", line 26, in cliHandler tfnet = TFNet(FLAGS) File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py", line 76, in init self.setup_meta_ops() File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/build.py", line 139, in setup_meta_ops if self.FLAGS.train: self.build_train_op() File "/home/assetone-04/Music/Darkflow-object-detection-master/darkflow/net/help.py", line 18, in build_train_op gradients = optimizer.compute_gradients(self.framework.loss) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 722, in _GradientsHelper out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1357, in ZerosLikeOutsideLoop return array_ops.zeros(zeros_shape, dtype=val.dtype) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1883, in zeros output = fill(shape, constant(zero, dtype=dtype), name=name) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3613, in fill "Fill", dims=dims, value=value, name=name) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/assetone-04/anaconda3/envs/abc/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

How could solve this error.I have No GPU and i have working on VMware using CPU. please Helpp.................

LuvRC commented 4 years ago

@thtrieu, @abagshaw please help. Thanks a lot in advance. please help.

Rishabh-Maheshwari commented 4 years ago

try reducing batch size.

ankitAMD commented 4 years ago

try reducing batch size.

already batch size is 1. but still shows this problem.

Rishabh-Maheshwari commented 4 years ago

Reduce your Dimension before annotations because of the limited RAM.

ankitAMD commented 4 years ago

Reduce your Dimension before annotations because of the limited RAM.

can you tell me which dimension i have to reduce ?

Janzeero-PhD commented 4 years ago

Hi there. Where to change batch size? Info in 'model'.cfg about batch = 64 and subportions (or something like that) = 8 does not affect batch size. It is still 16 in training information. I have tried to find where to change it, but lost. Please help, there are a lot of dummies here and we cannot find previous answers on such seemingly obvious but so sad question.

ankitAMD commented 4 years ago

@Rishabh-Maheshwari are you talking about dimension means layers of model. When i use tiny-yolo its working but yolo.cfg its not working.