Closed Junpink closed 5 years ago
can you share error message ?
Thank u for ur response :)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 132, in <module>
main()
File "main.py", line 120, in main
gan.train()
File "/share/home/junpink/BigGAN/BigGAN_128.py", line 302, in train
_, summary_str, d_loss = self.sess.run([self.d_optim, self.d_sum, self.d_loss])
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[256,4096,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node generator/self_attention/Softmax (defined at /share/home/junpink/BigGAN/ops.py:246) = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](generator/self_attention/MatMul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node add_2/_1055}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7534_add_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'generator/self_attention/Softmax', defined at:
File "main.py", line 132, in <module>
main()
File "main.py", line 113, in main
gan.build_model()
File "/share/home/junpink/BigGAN/BigGAN_128.py", line 231, in build_model
fake_images = self.generator(self.z)
File "/share/home/junpink/BigGAN/BigGAN_128.py", line 129, in generator
x = self_attention_2(x, channels=ch, sn=self.sn, scope='self_attention')
File "/share/home/junpink/BigGAN/ops.py", line 246, in self_attention_2
beta = tf.nn.softmax(s) # attention map
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1722, in softmax
return _softmax(logits, gen_nn_ops.softmax, axis, name)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1673, in _softmax
return compute_op(logits, name=name)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7138, in softmax
"Softmax", logits=logits, name=name)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[256,4096,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node generator/self_attention/Softmax (defined at /share/home/junpink/BigGAN/ops.py:246) = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](generator/self_attention/MatMul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node add_2/_1055}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7534_add_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
I don't know if the error is only caused by ResourceExhaustedError or some other reasons like wrong dimension.
@taki0112 sorry for wasting your time it turns out that i made a dummy mistake it works fine now thank you very much
@Junpink Can you tell me how to solve the erro? Thank you
@Ahalang actually my error was caused by other reasons which has nothing to do with modifying the channel or OOM. in short, if u wanna run his code on gray scale image, just change self.c_dim into 1 if u encounter OOM problem, perhaps u can try decreasing the batch size
dear author, thank u for ur brilliant work I'm wondering whether the program could work if i train it with the images of single channel and size of 128x128( gray scale image) I only changed self.c_dim into 1(BigGAN_128.py), but it seems to have some errors in generators. I wanna figure out that the error is raised because of img dimension or perhaps other reasons.