I wrote
$ python run_exp_stage1.py --cfg stageI/cfg/birds.yml --gpu 1
and prints out
.............................
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 117254912 totalling 111.82MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 134217728 totalling 1.25GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 214433792 totalling 204.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 268435456 totalling 768.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 276824064 totalling 264.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 318767104 totalling 304.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.20GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5632950272
InUse: 5579675904
MaxInUse: 5631752192
NumAllocs: 3795
MaxAllocSize: 1478306560
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 32.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[64,32,32,128]
Traceback (most recent call last):
File "run_exp_stage2.py", line 71, in
algo.train()
File "/home/han/StackGAN/stageII/trainer.py", line 506, in train
log_vars, sess)
File "/home/han/StackGAN/stageII/trainer.py", line 447, in train_one_step
ret_list = sess.run(feed_out_d, feed_dict)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,1024,4,4]
[[Node: custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](apply_5_3/apply/Maximum, hr_d_net/custom_conv2d_5/custom_conv2d_5/w/read)]]
[[Node: Adam_2/update/_184 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19689_Adam_2/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D', defined at:
File "run_exp_stage2.py", line 71, in
algo.train()
File "/home/han/StackGAN/stageII/trainer.py", line 463, in train
counter = self.build_model(sess)
File "/home/han/StackGAN/stageII/trainer.py", line 380, in build_model
self.init_opt()
File "/home/han/StackGAN/stageII/trainer.py", line 142, in init_opt
flag='hr')
File "/home/han/StackGAN/stageII/trainer.py", line 178, in compute_losses
self.model.hr_get_discriminator(images, embeddings)
File "/home/han/StackGAN/stageII/model.py", line 314, in hr_get_discriminator
x_code = self.hr_d_image_template.construct(input=x_var) # s16 s16 df_dim8
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/pretty_tensor_class.py", line 1248, in construct
return self._construct(context)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/scopes.py", line 158, in call
return self._call_func(args, kwargs)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/scopes.py", line 131, in _call_func
return self._func(args, *kwargs)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/pretty_tensor_class.py", line 1924, in _with_method_complete
return input_layer._method_complete(func(args, **kwargs))
File "/home/han/StackGAN/misc/custom_ops.py", line 82, in call
conv = tf.nn.conv2d(input_layer.tensor, w, strides=[1, d_h, d_w, 1], padding=padding)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2048,1024,4,4]
[[Node: custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](apply_5_3/apply/Maximum, hr_d_net/custom_conv2d_5/custom_conv2d_5/w/read)]]
[[Node: Adam_2/update/_184 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19689_Adam_2/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
I have never seen like this error before. please help me :(
this code is implemented on Ubuntu 16.04, Tensorflow r1.0.1, CUDA 8.0, cuDNN 5.1
I wrote $ python run_exp_stage1.py --cfg stageI/cfg/birds.yml --gpu 1
and prints out
............................. I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 117254912 totalling 111.82MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 134217728 totalling 1.25GiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 214433792 totalling 204.50MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 268435456 totalling 768.00MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 276824064 totalling 264.00MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 318767104 totalling 304.00MiB I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.20GiB I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: Limit: 5632950272 InUse: 5579675904 MaxInUse: 5631752192 NumAllocs: 3795 MaxAllocSize: 1478306560
W tensorflow/core/common_runtime/bfc_allocator.cc:274] **** W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 32.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[64,32,32,128] Traceback (most recent call last): File "run_exp_stage2.py", line 71, in
algo.train()
File "/home/han/StackGAN/stageII/trainer.py", line 506, in train
log_vars, sess)
File "/home/han/StackGAN/stageII/trainer.py", line 447, in train_one_step
ret_list = sess.run(feed_out_d, feed_dict)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,1024,4,4]
[[Node: custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](apply_5_3/apply/Maximum, hr_d_net/custom_conv2d_5/custom_conv2d_5/w/read)]]
[[Node: Adam_2/update/_184 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19689_Adam_2/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D', defined at: File "run_exp_stage2.py", line 71, in
algo.train()
File "/home/han/StackGAN/stageII/trainer.py", line 463, in train
counter = self.build_model(sess)
File "/home/han/StackGAN/stageII/trainer.py", line 380, in build_model
self.init_opt()
File "/home/han/StackGAN/stageII/trainer.py", line 142, in init_opt
flag='hr')
File "/home/han/StackGAN/stageII/trainer.py", line 178, in compute_losses
self.model.hr_get_discriminator(images, embeddings)
File "/home/han/StackGAN/stageII/model.py", line 314, in hr_get_discriminator
x_code = self.hr_d_image_template.construct(input=x_var) # s16 s16 df_dim8
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/pretty_tensor_class.py", line 1248, in construct
return self._construct(context)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/scopes.py", line 158, in call
return self._call_func(args, kwargs)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/scopes.py", line 131, in _call_func
return self._func(args, *kwargs)
File "/home/han/anaconda2/lib/python2.7/site-packages/prettytensor/pretty_tensor_class.py", line 1924, in _with_method_complete
return input_layer._method_complete(func(args, **kwargs))
File "/home/han/StackGAN/misc/custom_ops.py", line 82, in call
conv = tf.nn.conv2d(input_layer.tensor, w, strides=[1, d_h, d_w, 1], padding=padding)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/han/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2048,1024,4,4] [[Node: custom_conv2d_5_3/custom_conv2d/custom_conv2d_5/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](apply_5_3/apply/Maximum, hr_d_net/custom_conv2d_5/custom_conv2d_5/w/read)]] [[Node: Adam_2/update/_184 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_19689_Adam_2/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
I have never seen like this error before. please help me :(
this code is implemented on Ubuntu 16.04, Tensorflow r1.0.1, CUDA 8.0, cuDNN 5.1