keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.56k stars 19.41k forks source link

Why lambda layer returns None for gradient operators? #13050

Closed ehsanhaghighat closed 3 years ago

ehsanhaghighat commented 5 years ago

I try to implement a special DNN architecture to be used for physics-informed machine learning. As you may know, in this architecture, partial differential equations are integrated into the loss function. The architecture of interest includes:

  1. Input layers + hidden layers + output layer
  2. Gradient of output of 1 with respect to inputs (done through the Lambda layer)
  3. 2 as the input + hidden layers + output layer

The resulting network, however, has None gradients with respect to the Lambda layer. Note that the issue is coming from Lambda layer, because when I replace the gradient with anything else such as Multiply() or Add(), the resulting network works properly.

import numpy as np
import keras as k
import tensorflow as tf

def custom_gradient(y, x):
    return tf.gradients(y, x, unconnected_gradients='zero')

x = k.layers.Input(shape=(1,), name='x')
y = k.layers.Input(shape=(1,), name='y')

lay = k.layers.Dense(50, name='lay1')(k.layers.concatenate([x,y]))
lay = k.layers.Activation('tanh', name='tanh1')(lay)

lay = k.layers.Dense(50, name='lay2')(lay)

Txy = k.layers.Dense(1, name='Txy')(lay)

# dT_dx = k.layers.Lambda(lambda F: k.layers.Multiply()([F, F])) #<- works fine!
dT_dx = k.layers.Lambda(lambda F: custom_gradient(F, x)[0], name='dTxy_dx')
dT_dx = dT_dx(Txy)

# dT_dx = k.layers.Lambda(lambda F: k.layers.Multiply()([F, F])) #<- works fine!
dT_dy = k.layers.Lambda(lambda F: custom_gradient(F, y)[0], name='dTxy_dy')
dT_dy = dT_dy(Txy)

lay = k.layers.Dense(50, name='lay3')(k.layers.concatenate([dT_dx, dT_dy]))
Uxy = k.layers.Dense(1, name='Uxy')(lay)
Vxy = k.layers.Dense(1, name='Vxy')(lay)

model = k.models.Model([x,y], [Uxy, Vxy])
model.compile(optimizer='adam', loss='mse')

k.utils.plot_model(model, show_shapes=True, to_file='test2.png')

for lay in model.layers:
    print(k.backend.gradients(model.total_loss, lay.output))

model.fit([np.ones((10,1)), np.ones((10,1))],
          [np.ones((10,1)), np.ones((10,1))])
ehsanhaghighat commented 5 years ago

A quick update that it turns our that if I change the input to the Lambda layer, then there is no more None gradients, i.e. lambda x: K.gradient(Txy, x)[0] instead of lambda F: K.gradient(F, x)[0]. However, the issue will be that it only considers the second part of graph (after Lambda layer) as variable and does not change the full graph.

The model summary for the case before returns Trainable params: 3,003, however, if I define it this way, it returns Trainable params: 252. If I plot the graph, also, it completely removes the first part before the Lambda layers.

So, clearly, this seems to be a bug because I have defined such a network purely in Tensorflow in the past. The new code which works fine, but it is incorrect, is bellow:

import numpy as np
import keras as k
import tensorflow as tf

def custom_gradient(y, x):
    return tf.gradients(y, x, unconnected_gradients='zero')

x = k.layers.Input(shape=(1,), name='x')
y = k.layers.Input(shape=(1,), name='y')

lay = k.layers.Dense(50, name='lay1')(k.layers.concatenate([x,y]))
lay = k.layers.Activation('tanh', name='tanh1')(lay)

lay = k.layers.Dense(50, name='lay2')(lay)

Txy = k.layers.Dense(1, name='Txy')(lay)

dT_dx = k.layers.Lambda(lambda x: custom_gradient(Txy, x)[0], name='dTxy_dx')
dT_dx = dT_dx(x)

dT_dy = k.layers.Lambda(lambda x: custom_gradient(Txy, x)[0], name='dTxy_dy')
dT_dy = dT_dy(y)

lay = k.layers.Dense(50, name='lay3')(k.layers.concatenate([dT_dx, dT_dy]))
Uxy = k.layers.Dense(1, name='Uxy')(lay)
Vxy = k.layers.Dense(1, name='Vxy')(lay)

model = k.models.Model([x,y], [Uxy, Vxy])
model.compile(optimizer='adam', loss='mse')

k.utils.plot_model(model, show_shapes=True, to_file='test2.png')

for lay in model.layers:
    print(k.backend.gradients(model.total_loss, lay.output))

model.summary()

model.fit([np.ones((10,1)), np.ones((10,1))],
          [np.ones((10,1)), np.ones((10,1))])

For more details, please check the link bellow: https://stackoverflow.com/q/56843302/11693382?stw=2

ehsanhaghighat commented 5 years ago

It turns out if I add the keyword unconnected_gradients='zero' to the keras.backend.gradient, everything works fine. But I am not sure if this is a good hack!

Here is the change:

def gradients(loss, variables):
    """Returns the gradients of `loss` w.r.t. `variables`.

    # Arguments
        loss: Scalar tensor to minimize.
        variables: List of variables.

    # Returns
        A gradients tensor.
    """
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True, unconnected_gradients='zero')
bgstn commented 4 years ago

Hi,

I also found this problem, when I was trying to find the second derivative of a constant w.r.t input, which could be fixed if I add unconnected_gradients='zero' in keras source code.

But I find another hack if anyone don't want to change the source code.

from keras.layers import Lambda, Dense, Input, Reshape
from keras.models import Model
import keras.backend as K
import numpy as np

inp = Input((1,))

# l = Lambda(lambda x:  K.constant(3, shape=(1,)))(inp) #This is the error code
l = Lambda(lambda x:  K.exp(x*0) - 1 + K.constant(3, shape=(1,)))(inp) #The hack

derivative = Lambda(lambda x: K.gradients(x[0], x[1]), output_shape=[1])([l, inp])
derivative = Lambda(lambda x: K.gradients(x[0], x[1]), output_shape=[1])([derivative, inp])
m = Model(inp, [l, derivative])
m.compile('adam', loss='mse')

input_data = np.array([0])
print("normal output: ", m.predict(input_data))
print("weights: ", m.get_weights())

using exponential function is for finding different order derivatives.

Wonder if there is any other fix.

sutummala commented 3 years ago

How did you solve this using tape.gradient?