taki0112 / BigGAN-Tensorflow

Simple Tensorflow implementation of "Large Scale GAN Training for High Fidelity Natural Image Synthesis" (BigGAN)
MIT License
262 stars 75 forks source link

channel question #9

Closed Junpink closed 5 years ago

Junpink commented 5 years ago

dear author, thank u for ur brilliant work I'm wondering whether the program could work if i train it with the images of single channel and size of 128x128( gray scale image) I only changed self.c_dim into 1(BigGAN_128.py), but it seems to have some errors in generators. I wanna figure out that the error is raised because of img dimension or perhaps other reasons.

taki0112 commented 5 years ago

can you share error message ?

Junpink commented 5 years ago

Thank u for ur response :)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 132, in <module>
    main()
  File "main.py", line 120, in main
    gan.train()
  File "/share/home/junpink/BigGAN/BigGAN_128.py", line 302, in train
    _, summary_str, d_loss = self.sess.run([self.d_optim, self.d_sum, self.d_loss])
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[256,4096,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node generator/self_attention/Softmax (defined at /share/home/junpink/BigGAN/ops.py:246)  = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](generator/self_attention/MatMul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node add_2/_1055}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7534_add_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'generator/self_attention/Softmax', defined at:
  File "main.py", line 132, in <module>
    main()
  File "main.py", line 113, in main
    gan.build_model()
  File "/share/home/junpink/BigGAN/BigGAN_128.py", line 231, in build_model
    fake_images = self.generator(self.z)
  File "/share/home/junpink/BigGAN/BigGAN_128.py", line 129, in generator
    x = self_attention_2(x, channels=ch, sn=self.sn, scope='self_attention')
  File "/share/home/junpink/BigGAN/ops.py", line 246, in self_attention_2
    beta = tf.nn.softmax(s)  # attention map
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1722, in softmax
    return _softmax(logits, gen_nn_ops.softmax, axis, name)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1673, in _softmax
    return compute_op(logits, name=name)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7138, in softmax
    "Softmax", logits=logits, name=name)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/opt/python-3.6.8-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[256,4096,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node generator/self_attention/Softmax (defined at /share/home/junpink/BigGAN/ops.py:246)  = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](generator/self_attention/MatMul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node add_2/_1055}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7534_add_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Junpink commented 5 years ago

I don't know if the error is only caused by ResourceExhaustedError or some other reasons like wrong dimension.

Junpink commented 5 years ago

@taki0112 sorry for wasting your time it turns out that i made a dummy mistake it works fine now thank you very much

Ahalang commented 5 years ago

@Junpink Can you tell me how to solve the erro? Thank you

Junpink commented 5 years ago

@Ahalang actually my error was caused by other reasons which has nothing to do with modifying the channel or OOM. in short, if u wanna run his code on gray scale image, just change self.c_dim into 1 if u encounter OOM problem, perhaps u can try decreasing the batch size