Error from batch_norm - Githubissues

akaraspt commented 7 years ago

I got this error when I was trying to run your scripts.

Traceback (most recent call last):
  File "train.py", line 238, in <module>
    main()
  File "train.py", line 76, in main
    input_tensors, variables, loss, outputs, checks = gan.build_model()
  File "/home/akara/Workspace/text-to-image/model.py", line 44, in build_model
    disc_wrong_image, disc_wrong_image_logits   = self.discriminator(t_wrong_image, t_real_caption, reuse = True)
  File "/home/akara/Workspace/text-to-image/model.py", line 165, in discriminator
    h1 = ops.lrelu( self.d_bn1(ops.conv2d(h0, self.options['df_dim']*2, name = 'd_h1_conv'))) #16
  File "/home/akara/Workspace/text-to-image/Utils/ops.py", line 34, in __call__
    ema_apply_op = self.ema.apply([batch_mean, batch_var])
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 391, in apply
    self._averages[var], var, decay, zero_debias=zero_debias))
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 650, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable d_bn1/d_bn1_2/d_bn1_2/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It was when the script is trying to create another discriminator.

disc_real_image, disc_real_image_logits   = self.discriminator(t_real_image, t_real_caption)
disc_wrong_image, disc_wrong_image_logits   = self.discriminator(t_wrong_image, t_real_caption, reuse = True) # Here
disc_fake_image, disc_fake_image_logits   = self.discriminator(fake_image, t_real_caption, reuse = True)

I printed all variables but it seems to initialize with different variable names, but the reuse = True.

ghost commented 7 years ago

same problem

zhuolinumd commented 7 years ago

Is there any solution for this issue? @csbkwang @akaraspt @paarthneekhara

paarthneekhara commented 7 years ago

What tensorflow version are you using? IIRC the code ran on version r0.10. I don't have access to a machine to debug the code right now.

zhuolinumd commented 7 years ago

I used tensorflow 1.0. Thanks Paarth @paarthneekhara

paarthneekhara commented 7 years ago

@jiang2764 So did it work?

zhuolinumd commented 7 years ago

I got the same error when i want to run the train code. That's why I asked you and others. Thanks. @paarthneekhara

paarthneekhara commented 7 years ago

Hi, This is a compatibility issue with the tf update. Replace the batch_norm class code in ops.py by the one written here https://github.com/iamaaditya/DCGAN-tensorflow/blob/master/ops.py . This should fix the issue.

Duke-Wyh commented 7 years ago

I actually add the ops.py to replace the batch_norm.However, it still exists another problem: Variable d_h0_conv/w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? How can I do to solve the problem?Thanks!@paarthneekhara

Duke-Wyh commented 7 years ago

At last I solved the problem! There were two ways that we need to solve it.First, we should add the ops.py.Second,we should add with tf.variable_scope(tf.get_variable_scope()) to our code. Thanks everyone!

OwalnutO commented 7 years ago

I also got stuck in this problem and solved it in another way. My tensorflow version is '0.12.1'. I replace the batch_norm class code in ops.py. with the code from https://github.com/Hanock/generating_images_part_by_part/blob/master/code/lib/ops.py. I modify the init function(remove the parameter "batch_size") and it finally works.

zhhezhhe commented 7 years ago

@Duke-Wyh thanks, but where to put tf.variable_scope(tf.get_variable_scope())?

zhhezhhe commented 7 years ago

@jiang2764 Did you solve this problem? I have the same problem. I used tensorflow 1.0.1.

zhuolinumd commented 7 years ago

@zhhezhhe Please follow @paarthneekhara 's suggestion, update the ops file, and then modify the argument format for function tf.nn.sigmoid_cross_entropy_with_logits. The training process should work. Thanks @paarthneekhara ! I am running the training process now. I stopped working on this after I asked the question. Now it is time to go for this.

paarthneekhara commented 7 years ago

@OwalnutO , @jiang2764 if the method worked for you, can you please submit a pull request with the patch for the same?

zhhezhhe commented 7 years ago

this may help https://github.com/YearnyeenHo/text-to-image .

SpadesQ commented 6 years ago

where to put tf.variable_scope(tf.get_variable_scope())? @Duke-Wyh

Using https://github.com/YearnyeenHo/text-to-image, I still have this problem in tensorflow1.3. Variable d_h0_conv/w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? How to solve? Thank you@ zhhezhhe

314rated commented 6 years ago

Hi @SpadesQ , were you able to find a solution to this? I am facing the same issue.

On replacing ops file, problem with Adam comes while training. If trying to use checkpoint, otFoundError (see above for traceback): Tensor name "d_bn1/moving_mean" not found in checkpoint files Data/Models/latest_model_flowers_temp.ckpt

ravindra82 commented 6 years ago

@paarthneekhara

When i try to generate images using the pre trained model, Even i get the following error.

NotFoundError (see above for traceback): Tensor name "d_bn1/moving_mean" not found in checkpoint files Data/Models/latest_model_flowers_temp.ckpt

I am using the code from here https://github.com/YearnyeenHo/text-to-image and have the downloaded the checkpoint file from the link given.

Please suggest a solution.

TheScott463 commented 4 years ago

@paarthneekhara Thanks for writing this code. I have the same problem as above. I'm running the latest release of each lib needed, but this one stumped me. Is there a good solution that makes this work? All the dialog above is a bit hodgepodge. I'd like to see your solution please.

paarthneekhara / text-to-image

Error from batch_norm #13