Closed ckyleda closed 3 years ago
If anyone would like to reproduce this easily, simply take the keras-contrib improved Wasserstein GAN available here:
https://github.com/keras-team/keras-contrib/blob/master/examples/improved_wgan.py
And make the discriminator a multi_gpu_model.
I have the same problem as you.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8,1,1,1] vs. [4,512,512,3]
[[{{node replica_0/model_3/random_weighted_average_1/mul}}]]
[[{{node replica_1/model_3/model_2_2/activation_26/Sigmoid}}]]
Gradient problem, you can try:
def repalce_none_with_zero(l): return [K.zeros((1)) if i==None else i for i in l] gradients=replace_none_with_zero(K.gradients(y_pred,averaged_samples))
have successfully run through my code, In clalss:RancomWeightedAverage(_Merge).
we should modify the code alpha=K.random_uniform((batchsize,1,1,1))
to
alpha=K.random_uniform((int(batchsize/gpus_num),1,1,1))
Okay, I have the code executing, but unfortunately this does NOT resolve the problem as the gradients are always None, and hence always zero, which renders that section of the loss function useless.
So while the model is running, it isn't actually working. Back to square one!
I think this error arises because once the model is replicated across GPUs, y_pred is concatenated at the end, and it is this that is used to calculate the gradients. Keras is not smart enough to handle this behaviour, as I suspect it's unable to figure out how to get the gradients of the output wrt to the averaged samples across n GPUs.
There needs to be a way to calculate the gradients BEFORE the concatenating step, for each replicated GPU model, calculate the penalty and then possibly average it and use this to penalise the parallelised model.
I don't even know if this is possible, and as no Keras dev has weighed in yet, it seems like Keras is only able to parallelise the very simplest models. I also find it very odd that this improved WGAN has been sitting in the examples for 2 years now and nobody has tried to parallelise it?
I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function
self.discriminator_model = Model(inputs=[real_samples,
generator_input_for_discriminator],
outputs=[discriminator_output_from_real_samples,
discriminator_output_from_generator,
averaged_samples_out],
name = 'discriminator')
self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num)
self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9),
loss=[wasserstein_loss,
wasserstein_loss,
partial_gp_loss])
#replacing 'y_pred' in your gradient_penalty_loss function
gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]
I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function
self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num) self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9), loss=[wasserstein_loss, wasserstein_loss, partial_gp_loss], name='discriminator') #replacing 'y_pred' in your gradient_penalty_loss function gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]
@mofafa I'll give this a try. Would you provide a full code listing? For example, which layer has the name "discriminator" ? (I assume it's the output but would be good to be sure.)
How does this work considering multi_gpu_model renames layers to ensure there's no clash between them? (I simply get "No such layer: discriminator")
I solved this problem by replacing 'y_pred' in your gradient_penalty_loss function
self.discriminator_model = multi_gpu_model(self.discriminator_model, gpus=gpu_num) self.discriminator_model.compile(optimizer=Adam(0.0001, beta_1=0.5, beta_2=0.9), loss=[wasserstein_loss, wasserstein_loss, partial_gp_loss], name='discriminator') #replacing 'y_pred' in your gradient_penalty_loss function gradients = K.gradients(self.discriminator_model.get_layer('discriminator').outputs[-1], averaged_samples)[0]
@Mofafa I'll give this a try. Would you provide a full code listing? For example, which layer has the name "discriminator" ? (I assume it's the output but would be good to be sure.)
How does this work considering multi_gpu_model renames layers to ensure there's no clash between them? (I simply get "No such layer: discriminator")
Sorry for my mistake. I didn't use the original code in your link and made a mistake. Now I've changed it. You need to name your discriminator_model when you use 'Model' to build it. 'multi_gpu_model' will take your discriminator_model as a layer. And the outputs index depends on your partial_gp_loss index.
@Mofafa
Great, I can confirm that appears to work! The gradients exist and the GPUs are being utilised. Haven't checked the outputs yet.
I believe the issue should remain open though as this is still an issue for standard multi_gpu_model use.
System information
Describe the current behavior
The improved wasserstein GAN requires the gradients of a model to be extracted with regard to some averaged training samples. These gradients are then used to train the discriminator. However, when trying to use multi_gpu_model on this model, despite the model working on a single GPU the model fails to compile with the following error:
It looks like for whatever reason trying to call K.gradients on the model that multi_gpu_model creates returns a tensor with None values. Why is this happening?
The loss function in use is the common WGAN-GP keras loss and looks like this:
I printed out the values at compile time of the averaged_samples, y_pred, and K.gradients and get:
Tensor("random_weighted_average_1/add:0", shape=(12, 4, 4, 3), dtype=float32) Tensor("model_2_3_1/concat:0", shape=(24, 1), dtype=float32, device=/device:CPU:0) [None]
My batch size at input is 12. I expected this to be split into two models each processing 6 samples, but it looks like instead the multi_gpu_model is being given 12 samples each and concatenating them into 24 total?
Any ideas on how to fix this? It seems multi_gpu_model doesn't handle models with multiple inputs/multiple outputs and custom loss functions very well.
Perhaps somebody has managed to get this sort of model working across multiple GPUs?