taki0112 / UGATIT

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)
MIT License
6.17k stars 1.04k forks source link

Can't train Google Colab #67

Closed DiMiTriFrog closed 4 years ago

DiMiTriFrog commented 5 years ago

I get this error:

Traceback (most recent call last): File "main.py", line 106, in main() File "main.py", line 98, in main gan.train() File "/content/UGATIT/UGATIT.py", line 538, in train self.Generator_loss, self.G_loss], feed_dict = train_feed_dict) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[1048576,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node gradients/generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul (defined at /content/UGATIT/UGATIT.py:438) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[mul_1/_195]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1048576,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node gradients/generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul (defined at /content/UGATIT/UGATIT.py:438) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node gradients/generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul: generator_B/MLP/linear_0/dense/kernel/read (defined at /content/UGATIT/ops.py:100)

Input Source operations connected to node gradients/generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul: generator_B/MLP/linear_0/dense/kernel/read (defined at /content/UGATIT/ops.py:100)

Original stack trace for 'gradients/generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul': File "main.py", line 106, in main() File "main.py", line 92, in main gan.build_model() File "/content/UGATIT/UGATIT.py", line 438, in build_model self.G_optim = tf.train.AdamOptimizer(self.lr, beta1=0.5, beta2=0.999).minimize(self.Generator_loss, var_list=G_vars) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/optimizer.py", line 403, in minimize grad_loss=grad_loss) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py", line 731, in _GradientsHelper lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py", line 403, in _MaybeCompile return grad_fn() # Exit early File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py", line 731, in lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_grad.py", line 1066, in _L2LossGrad return op.inputs[0] grad File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper return func(x, y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 1180, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 6490, in mul "Mul", x=x, y=y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

...which was originally created as op 'generator_B/MLP/linear_0/dense/kernel/Regularizer/l2_regularizer/L2Loss', defined at: File "main.py", line 106, in main() [elided 0 identical lines from previous traceback] File "main.py", line 92, in main gan.build_model() File "/content/UGATIT/UGATIT.py", line 366, in build_model x_ab, cam_ab = self.generate_a2b(self.domain_A) # real a File "/content/UGATIT/UGATIT.py", line 279, in generatea2b out, cam, = self.generator(x_A, reuse=reuse, scope="generator_B") File "/content/UGATIT/UGATIT.py", line 142, in generator gamma, beta = self.MLP(x, reuse=reuse) File "/content/UGATIT/UGATIT.py", line 171, in MLP x = fully_connected(x, channel, usebias, scope='linear' + str(i)) File "/content/UGATIT/ops.py", line 100, in fully_connected x = tf.layers.dense(x, units=units, kernel_initializer=weight_init, kernel_regularizer=weight_regularizer, use_bias=use_bias) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/core.py", line 188, in dense return layer.apply(inputs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply return self.call(inputs, *args, *kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 537, in call outputs = super(Layer, self).call(inputs, args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in call self._maybe_build(inputs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build self.build(input_shapes) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py", line 1017, in build trainable=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 455, in add_weight self._handle_weight_regularization(name, variable, regularizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1619, in _handle_weight_regularization self.add_loss(functools.partial(_loss_for_variable, variable)) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 281, in add_loss loss_tensor = regularizer()

khacduy221997 commented 3 years ago

Hi, @taki0112 I have the same problem and my account is Pro. 2021-06-29 09:49:48.464195: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **** 2021-06-29 09:49:48.464249: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[1,128,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[4096,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node discriminator_A_1/global/conv_3/truediv}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[mul_1/_1749]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[4096,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node discriminator_A_1/global/conv_3/truediv}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 122, in main() File "main.py", line 105, in main gan.train() File "/content/drive/MyDrive/UGATIT.py", line 538, in train self.Generator_loss, self.G_loss], feed_dict = train_feed_dict) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[4096,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node discriminator_A_1/global/conv_3/truediv (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[mul_1/_1749]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[4096,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node discriminator_A_1/global/conv_3/truediv (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations. 0 derived errors ignored.

Original stack trace for 'discriminator_A_1/global/conv_3/truediv': File "main.py", line 122, in main() File "main.py", line 99, in main gan.build_model() File "/content/drive/MyDrive/UGATIT.py", line 376, in build_model fake_A_logit, fake_A_cam_logit, fake_B_logit, fake_B_cam_logit = self.discriminate_fake(x_ba, x_ab) File "/content/drive/MyDrive/UGATIT.py", line 295, in discriminate_fake fake_A_logit, fake_A_camlogit, , _ = self.discriminator(x_ba, reuse=True, scope="discriminator_A") File "/content/drive/MyDrive/UGATIT.py", line 192, in discriminator global_x, global_cam, global_heatmap = self.discriminator_global(x_init, reuse=reuse, scope='global') File "/content/drive/MyDrive/UGATIT.py", line 206, in discriminator_global x = conv(x, channel 2, kernel=4, stride=2, pad=1, padtype='reflect', sn=self.sn, scope='conv' + str(i)) File "/content/drive/MyDrive/ops.py", line 39, in conv x = tf.nn.conv2d(input=x, filter=spectral_norm(w), File "/content/drive/MyDrive/ops.py", line 260, in spectral_norm w_norm = w / sigma File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/math_ops.py", line 899, in binary_op_wrapper return func(x, y, name=name) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/math_ops.py", line 1005, in _truediv_python3 return gen_math_ops.real_div(x, y, name=name) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/gen_math_ops.py", line 7954, in real_div "RealDiv", x=x, y=y, name=name) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

Please help me resolve the problem
Thank you