keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

Multi_gpu_model fails with improved wasserstein GAN. #13122

Closed ckyleda closed 3 years ago

ckyleda commented 5 years ago

System information

Describe the current behavior

The improved wasserstein GAN requires the gradients of a model to be extracted with regard to some averaged training samples. These gradients are then used to train the discriminator. However, when trying to use multi_gpu_model on this model, despite the model working on a single GPU the model fails to compile with the following error:

Traceback (most recent call last):
  File "progan.py", line 402, in <module>
    G, D, GAN, Discriminator = construct_models(lod, a)
  File "progan.py", line 360, in construct_models
    Discriminator.compile(optimizer=adam, loss=[wasserstein_loss_reals, wasserstein_loss, partial_gp_loss])
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 342, in compile
    sample_weight, mask)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/engine/training_utils.py", line 404, in weighted
    score_array = fn(y_true, y_pred)
  File "progan.py", line 76, in gradient_penalty_loss
    gradients_sqr = K.square(gradients)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 1470, in square
    return tf.square(x)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 342, in square
    return gen_math_ops.square(x, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8197, in square
    "Square", x=x, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 528, in _apply_op_helper
    (input_name, err))
ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

It looks like for whatever reason trying to call K.gradients on the model that multi_gpu_model creates returns a tensor with None values. Why is this happening?

The loss function in use is the common WGAN-GP keras loss and looks like this:

def gradient_penalty_loss(y_true, y_pred, averaged_samples,
                          gradient_penalty_weight):
    gradients = K.gradients(y_pred, averaged_samples)[0]
    gradients_sqr = K.square(gradients)
    gradients_sqr_sum = K.sum(gradients_sqr,
                              axis=np.arange(1, len(gradients_sqr.shape)))
    gradient_l2_norm = K.sqrt(gradients_sqr_sum)
    gradient_penalty = gradient_penalty_weight * K.square(1 - gradient_l2_norm)
    return K.mean(gradient_penalty)`

I printed out the values at compile time of the averaged_samples, y_pred, and K.gradients and get:

Tensor("random_weighted_average_1/add:0", shape=(12, 4, 4, 3), dtype=float32) Tensor("model_2_3_1/concat:0", shape=(24, 1), dtype=float32, device=/device:CPU:0) [None]

My batch size at input is 12. I expected this to be split into two models each processing 6 samples, but it looks like instead the multi_gpu_model is being given 12 samples each and concatenating them into 24 total?

Any ideas on how to fix this? It seems multi_gpu_model doesn't handle models with multiple inputs/multiple outputs and custom loss functions very well.

Perhaps somebody has managed to get this sort of model working across multiple GPUs?

ckyleda commented 5 years ago

If anyone would like to reproduce this easily, simply take the keras-contrib improved Wasserstein GAN available here:

https://github.com/keras-team/keras-contrib/blob/master/examples/improved_wgan.py

And make the discriminator a multi_gpu_model.

lilili0402 commented 5 years ago

I have the same problem as you. tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8,1,1,1] vs. [4,512,512,3]
[[{{node replica_0/model_3/random_weighted_average_1/mul}}]]
[[{{node replica_1/model_3/model_2_2/activation_26/Sigmoid}}]]

lilili0402 commented 5 years ago

Gradient problem, you can try: def repalce_none_with_zero(l): return [K.zeros((1)) if i==None else i for i in l] gradients=replace_none_with_zero(K.gradients(y_pred,averaged_samples))

lilili0402 commented 5 years ago

have successfully run through my code, In clalss:RancomWeightedAverage(_Merge). we should modify the code alpha=K.random_uniform((batchsize,1,1,1)) to alpha=K.random_uniform((int(batchsize/gpus_num),1,1,1))

ckyleda commented 5 years ago

Okay, I have the code executing, but unfortunately this does NOT resolve the problem as the gradients are always None, and hence always zero, which renders that section of the loss function useless.

So while the model is running, it isn't actually working. Back to square one!

I think this error arises because once the model is replicated across GPUs, y_pred is concatenated at the end, and it is this that is used to calculate the gradients. Keras is not smart enough to handle this behaviour, as I suspect it's unable to figure out how to get the gradients of the output wrt to the averaged samples across n GPUs.

There needs to be a way to calculate the gradients BEFORE the concatenating step, for each replicated GPU model, calculate the penalty and then possibly average it and use this to penalise the parallelised model.

I don't even know if this is possible, and as no Keras dev has weighed in yet, it seems like Keras is only able to parallelise the very simplest models. I also find it very odd that this improved WGAN has been sitting in the examples for 2 years now and nobody has tried to parallelise it?

Mofafa commented 5 years ago

I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function

self.discriminator_model = Model(inputs=[real_samples,
                                    generator_input_for_discriminator],
                            outputs=[discriminator_output_from_real_samples,
                                     discriminator_output_from_generator,
                                     averaged_samples_out],
                            name = 'discriminator')
self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num)
self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9),
                                loss=[wasserstein_loss,
                                    wasserstein_loss,
                                    partial_gp_loss])

#replacing 'y_pred' in your gradient_penalty_loss function
gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]
ckyleda commented 5 years ago

I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function

self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num)
self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9),
                                loss=[wasserstein_loss,
                                    wasserstein_loss,
                                    partial_gp_loss],
                                name='discriminator')

#replacing 'y_pred' in your gradient_penalty_loss function
gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]

@mofafa I'll give this a try. Would you provide a full code listing? For example, which layer has the name "discriminator" ? (I assume it's the output but would be good to be sure.)

How does this work considering multi_gpu_model renames layers to ensure there's no clash between them? (I simply get "No such layer: discriminator")

Mofafa commented 5 years ago

I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function

self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num)
self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9),
                                loss=[wasserstein_loss,
                                    wasserstein_loss,
                                    partial_gp_loss],
                                name='discriminator')

#replacing 'y_pred' in your gradient_penalty_loss function
gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]

@Mofafa I'll give this a try. Would you provide a full code listing? For example, which layer has the name "discriminator" ? (I assume it's the output but would be good to be sure.)

How does this work considering multi_gpu_model renames layers to ensure there's no clash between them? (I simply get "No such layer: discriminator")

Sorry for my mistake. I didn't use the original code in your link and made a mistake. Now I've changed it. You need to name your discriminator_model when you use 'Model' to build it. 'multi_gpu_model' will take your discriminator_model as a layer. And the outputs index depends on your partial_gp_loss index.

ckyleda commented 5 years ago

@Mofafa

Great, I can confirm that appears to work! The gradients exist and the GPUs are being utilised. Haven't checked the outputs yet.

I believe the issue should remain open though as this is still an issue for standard multi_gpu_model use.