shekkizh / WassersteinGAN.tensorflow

Tensorflow implementation of Wasserstein GAN - arxiv: https://arxiv.org/abs/1701.07875
MIT License
416 stars 130 forks source link

tf.get_variable() error, variable does not exist or was not created #3

Open SimpleXP opened 7 years ago

SimpleXP commented 7 years ago

My tensorflow version is 0.12.1

when I run run_main.py, I got this error

"ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/cond/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?"

Any one has any idea?

davidz-zzz commented 7 years ago

Maybe you could add: with tf.variable_scope(tf.get_variable_scope(), reuse=False): before ema.apply

https://github.com/carpedm20/DCGAN-tensorflow/issues/59

loliverhennigh commented 7 years ago

This worked for me! (tensorflow 1.0 alpha

chulaihunde commented 7 years ago

This do not worked for me! (tensorflow 1.0 nightly)

Traceback (most recent call last):

File "main.py", line 55, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 44, in main
    FLAGS.optimizer_param)
  File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 197, in create_network
    scope_reuse=True)
  File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 118, in _discriminator
    h_bn = utils.batch_norm(h_conv, dims[index + 1], train_phase, scope="disc_bn%d" % index)
  File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 145, in batch_norm
    lambda: (ema.average(batch_mean), ema.average(batch_var)))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1741, in cond
    orig_res, res_t = context_t.BuildCondBranch(fn1)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1642, in BuildCondBranch
    r = fn()
  File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 139, in mean_var_with_update
    ema_apply_op = ema.apply([batch_mean, batch_var])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 375, in apply
    colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 135, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 112, in create_slot
    return _create_slot_var(primary, val, "")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
    validate_shape=val.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1033, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 932, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
kunrenzhilu commented 7 years ago

It spent me two days to figure out the workaround and ends up with failure. It seems the reason is that although the fake_discriminator set the scope_reuse to be True, however, the tf.cond() statement will create a new control_flow every time, such that the get_variable() cannot retrieve the corresponding variables from the real_discriminator and throw a ValueError .../.../discriminator_1/disc_bn1/... blablabla. Bcuz according to my understanding, there shouldn't be a nested scope ../../discriminator_1 and nested ../../../disc_bn1. Tell me if I am wrong. Anyway, I cannot make changes base on the original code. My workaround was to change to tf.contrib.layers.batch_norm(). Done with one statement.

lengoanhcat commented 7 years ago

@kunrenzhilu : could you be more specific about how you modify tf.contrib.layers.batch_norm() ? I am struggling with the same problem stated above.

bottlecapper commented 7 years ago

I have the same problem. After adding: with tf.variable_scope(tf.get_variable_scope(), reuse=False): before ema.apply There comes another problem at model.initialize_network(FLAGS.logs_dir):

Traceback (most recent call last):
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
     [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 45, in main
    model.initialize_network(FLAGS.logs_dir)
  File "/media/jg/F/20170514/models/GAN_models.py", line 225, in initialize_network
    self.sess.run(tf.global_variables_initializer())
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
     [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'Placeholder', defined at:
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 43, in main
    FLAGS.optimizer_param)
  File "/media/jg/F/20170514/models/GAN_models.py", line 173, in create_network
    self._setup_placeholder()
  File "/media/jg/F/20170514/models/GAN_models.py", line 149, in _setup_placeholder
    self.train_phase = tf.placeholder(tf.bool)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1587, in placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2043, in _placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype bool
     [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Process finished with exit code 1
AshishBora commented 7 years ago

Check this out. It seems to have fixed the problems for me. https://github.com/AshishBora/WassersteinGAN.tensorflow/commit/1c6cfa1c20959e9dcca01f0a96f7ca8c54403d1a

UPDATE: After training for 8+ hours with this change, the GAN seems to not learn anything and the loss ranges (for d_loss and g_loss) are way off.

UPDATE 2: I trained with this commit and TF v1.1.0. It seems to have learned to produce faces.

kinsumliu commented 7 years ago

@AshishBora Hi, may you report the number you get for generator and discriminator loss? I am doing WGAN for MNIST images and I see g loss is ~200 and d loss is ~0.003 in the first hour.

RyanHangZhou commented 7 years ago

@kunrenzhilu could you give a concrete solution to the problem? I can hardly solve it either.

ayrtondenner commented 6 years ago

@AshishBora your commits are giving a 404 for me, could you show how did you fixed it?

AshishBora commented 6 years ago

@ayrtondenner I changed line 115 here to something like:

h_bn = tf.contrib.layers.batch_norm(inputs=h_conv, decay=0.9, epsilon=1e-5, is_training=train_phase, scope="disc_bn%d" % index)
ayrtondenner commented 6 years ago

I should change line 326 too, right? They are both batch_norm inside a discriminator network.

AshishBora commented 6 years ago

Yup, that seems right.

ayrtondenner commented 6 years ago

I'm already running it, seems like now it's going to work. Anyway, do you know why your commits are 404'd by now?

AshishBora commented 6 years ago

Great. Oh, 404 is because I deleted my fork some time ago since I wasn't using it anymore.

ayrtondenner commented 6 years ago

I see. I had to re-run it since there were still some minor changes because of TensorFlow 1.0 and compatibility issues. Anyway, do you still have these commits? It would be nice to see if you did any other code changes.

AshishBora commented 6 years ago

I have a local copy of the whole repo. I have uploaded a zip here.

ayrtondenner commented 6 years ago

I had the network training during 10 hours, 11k epochs, and that's the result I got. It still not a human face, but I wanted to know if the training is going ok or not, because as you said above, you can run the network but it doesn't mean its necessarily working. Also, I changed both utils.batch_norm calls in the discriminator network, but just realized that there are also calls in the generator network, maybe I can replace them to see if it will work better.

Loss functions

Network images

shimafoolad commented 5 years ago

On tensorflow 1.12.0, I had the same problem and fixed it by adding the line:

        with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):

before ema.apply