Open shensq0814 opened 7 years ago
I think 2 GB is enough. I tried to limit memory usage by using tf.ConfigProto and it run (batch size = 8, memory consumption = 1833 MB).
Are you using cuDNN v5.1?
Yes, CUDA 8.0 with cuDNN 5.1. The available memory on my computer is about 1 3GB.
I notice that you use all the features in the VGG, which is different from the orginal paper. Could it be the reason why the model need that much memory?
The available memory on my computer is about 1 3GB.
1.3 GB?
Could it be the reason why the model need that much memory?
I think SRGAN needs much memory as it builds Generator (ResNet), Discrimitator, and VGG19.
As you said, it might have an effect on reducing memory usage. Modify inference_content_loss as follows:
def inference_content_loss(x, imitation):
_, x_phi = self.vgg.build_model(
x, tf.constant(False), False)
_, imitation_phi = self.vgg.build_model(
imitation, tf.constant(False), True)
content_loss = tf.nn.l2_loss(x_phi[4] - imitation_phi[4]) # phi54
return tf.reduce_mean(content_loss)
I've installed the environment needed on another computer with enough memory. However I get another error when the first epoch finished.
Caused by op 'generator/deconv1/conv2d_transpose', defined at: File "train.py", line 95, in
train() File "train.py", line 18, in train model = SRGAN(x, is_training, batch_size) File "/home/min/ssq/srgan/src/srgan.py", line 14, in init self.imitation = self.generator(self.downscaled, is_training, False) File "/home/min/ssq/srgan/src/srgan.py", line 25, in generator x, [3, 3, 64, 3], [self.batch_size, 24, 24, 64], 1) File "../utils/layer.py", line 43, in deconv_layer strides=[1, stride, stride, 1]) File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1104, in conv2d_transpose name=name) File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 496, in conv2d_backprop_input data_format=data_format, name=name) File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/min/anaconda/envs/shen/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack() InvalidArgumentError (see above for traceback): Conv2DSlowBackpropInput: input and out_backprop must have the same batch size
Fix on line 45 of src/train.py
True:
n_iter = int(len(x_train) / batch_size)
False:
n_iter = int(np.ceil(len(x_train) / batch_size))
The implementation of your generator seems different from the paper where only last two layers are deconvolution layers(they changed into sub-pixel CNN recently). You used deconv_layer in all of the residual blocks. Is that a mistake or you intended to?
Hi, Tadax, yes I have the same concern as @Doodleyard . Although in the CVPR paper the final published generator network is different from their arXiv version, from your code is neither of them. Do you mind to give us some hints? thank you.
I tried to run the train.py to train srgan. But the program terminates since there is not enough memory. My GPU is 860M with 2G memory.
How much memory exactly does the program need? Is there any way to reduce the memory needed. I tried to change the batch size but with no effect.
Thank you.