How to deep control gradient back propagation with Keras

jerryli1981 commented 8 years ago

Hi All, I would like to know how to write code to conduct gradient back propagation. Like Lua does below,

local sim_grad = self.criterion:backward(output, targets[j]) local rep_grad = self.MLP:backward(rep, sim_grad)

Keras's example teach me how to construct sequential model like below, model = Sequential() model.add(Dense(128, input_shape=(784,))) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(10)) model.add(Activation('softmax'))

However, it is not enough for me. I need generate gradient for this model. How can write code to control sequential model backward propagation? Thanks

EderSantana commented 8 years ago

You want to train the model or you need the gradients to do something else? If you want to train the model, just keep reading the docs and see the fit method it will calculate gradients and train everything for your.

If you need the gradients to do other things you have to use Theano. You have to get the output of your model and, define a cost function and calculate the gradients with respect to each parameter. For example:

D = T.matrix() # desired
Y = model.get_output()
Cost = ((D-Y)**2).mean()
gradients = [T.grad(Cost, p) for p in model.get_params()]

jerryli1981 commented 8 years ago

My model is Recurive Neutral Network(RNN) + MLP. Based on your suggestion. I have two choices. One is focus on training MLP. and generate gradients to train RNN. The other is I build a sequence model contains RNN + MLP. And then, train together. The second choice seems like below

model.add(MyRNN) model.add(Dense(128, input_shape=(784,))) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(10)) model.add(Activation('softmax'))

Is that possible?

EderSantana commented 8 years ago

run and understand this example, they do something like what you are doing https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

NightFury13 commented 8 years ago

@EderSantana : I don't think thats what Jerry asked. Is there some way to compute the gradients of backpropogation w.r.t each hidden layer (or input layer?). An equivalent of this in Caffe for example would be something like :

net.blobs[last_layer].diff[0][target_class]=1 #Setting diff of last layer to 1 (i.e. grad considering target class is obtained)
back_pass = net.backward()
jacobian = back_pass[desired_layer].copy() #Gives gradient update for the desired_class.

@jerryli1981 : Were you able to find a way to do this?

johnny5550822 commented 7 years ago

@jerryli1981 Were you able to identify the way to calculate the gradient in a layer? (I am also originally a torch7 user and it is straightforward to do that. I am not sure about in Keras...)

jemshit commented 7 years ago

I'm trying to do backpropagation with MLP, do we have a way to do backward pass in Keras (usng Tensorflow)?

hamzamerzic commented 7 years ago

@jemshit TensorFlow allows that using opt.apply_gradients method, as shown here: https://www.tensorflow.org/api_docs/python/tf/train/Optimizer or here: https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L592 Is there a backend agnostic way of doing this though? @fchollet

ROZBEH commented 7 years ago

Were you guys able to resolve this issue? I have to back propagate the error but at each time step the derivative is different and I have to manipulate that. How is that possible in Keras/Tensor flow?

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

mongoose54 commented 6 years ago

More or less the same question here: How can I backpropagate a specific error value in a Keras model? Thanks

ROZBEH commented 6 years ago

I couldn't figure this out. I ended up using Pytorch. Pytorch gives you this capability.

jnhelen commented 6 years ago

@jemshit Hi! Have you solved this problem?

jemshit commented 6 years ago

No need to do anything manually. Optimizer algorithm (SGD, Adadelta...) will use backpropagation.

On Wed, Oct 3, 2018, 19:26 jiangnan notifications@github.com wrote:

@jemshit https://github.com/jemshit Hi! Have you solved this problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/956#issuecomment-426701638, or mute the thread https://github.com/notifications/unsubscribe-auth/AB28nNUU_P8qyz7iH1dyAL4hEz10ESxkks5uhOUhgaJpZM4GdEPs .

eliethesaiyan commented 5 years ago

@jemshit ,i think what @jerryli1981 meant is to be able to apply a function on gradient at each stage of backprop or forwardpass. for example what if you want to binarize(quantize) gradient on each backprob which is widely used in quantized model

birdmw commented 5 years ago

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
from keras import losses
import numpy as np
import tensorflow as tf
from sklearn.metrics import mean_squared_error
from math import sqrt

model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

inputs = np.random.random((1, 8))
outputs = model.predict(inputs)
targets = np.random.random((1, 8))
rmse = sqrt(mean_squared_error(targets, outputs))
loss = losses.mean_squared_error(targets, model.output)

#  ===== Symbolic Gradient =====
gradients = k.gradients(loss, model.trainable_weights)

print("===BEFORE WALKING DOWN GRADIENT===")
print("outputs:\n", outputs)
print("targets:\n", targets)

# Begin TensorFlow
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())

steps = 100  # steps of gradient descent
for s in range(steps):

    # ===== Numerical gradient =====
    evaluated_gradients = sess.run(gradients, feed_dict={model.input: inputs})

    # Step down the gradient for each layer
    for i in range(len(model.trainable_weights)):
        sess.run(tf.assign_sub(model.trainable_weights[i], evaluated_gradients[i]))

    # Every 10 steps print the RMSE
    if s % 10 == 0:
        outputs = model.predict(inputs)
        rmse = sqrt(mean_squared_error(targets, outputs))
        print("step " + str(s) + " rmse:", rmse)

final_outputs = model.predict(inputs)
final_rmse = sqrt(mean_squared_error(targets, final_outputs))

print("===AFTER STEPPING DOWN GRADIENT===")
print("outputs:\n", outputs)
print("targets:\n", targets)

theceday commented 5 years ago

Is there any way to do this? or with tf.keras?

maulberto3 commented 5 years ago

Hi @theceday I also need to manually backprop gradients in keras. Did you managed?

birdmw commented 5 years ago

ya finally but it's super slow. What are you using it for? you can batch train if you are trying for reinforcement learning.

On Tue, Apr 23, 2019 at 8:12 PM Mauricio notifications@github.com wrote:

Hi @theceday https://github.com/theceday I also need to manually backprop gradients in keras. Did you managed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/956#issuecomment-486050554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWKJTCMIGQVSKG6ZRO3B73PR7F25ANCNFSM4BTUIPWA .

maulberto3 commented 5 years ago

@theceday I am in the process of. Just calculated gradients outside the computation graph (I can see them in my termnail). Now I need to update each weight accordingly, I guess I'm doing what an optimizer does for you, however, as you know RL models differ a bit from keras internals, so that's why I am 'on foot' here. That's also why model.train_on_batch() does not fit my needs either.

birdmw commented 5 years ago

did you see my example?

On Wed, Apr 24, 2019 at 10:26 AM Mauricio notifications@github.com wrote:

@theceday https://github.com/theceday I am in the process of. Just calculated gradients outside the computation graph (I can see them in my termnail). Now I need to update each weight accordingly, I guess I'm doing what an optimizer does for you, however, as you know RL models differ a bit from keras internals, so that's why I am 'on foot' here. That's also why model.train_on_batch() does not fit my needs either.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/956#issuecomment-486342449, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWKJTFUV7AQFFU4P65RF5DPSCJ6FANCNFSM4BTUIPWA .

theceday commented 5 years ago

I am not sure everyone has the same case, but I was trying to backrop a custom loss value (numpy array/input tensor), I have used K.switch/tf.cond with no luck. As far I understand so far, tf doesnt backprop those seperate branches. In order to that loss function should be "explicitly" defined as "loss" function, so that maybe some operators could be used for that.

Maybe instead of using K.switch, returning a loss expression containing both tensors (actual and custom) might work, but I am not sure if it allows such a expression

I might give another try for this, if I have time

Edit: There is a listed change in tensorflow 2.0 alpha release: Adding clear_losses API to be able to clear losses at the end of forward pass in a custom training loop in eager.

That hints there could be some changes in tf&tf.keras that could help the issue, but I am not sure atm

keras-team / keras

How to deep control gradient back propagation with Keras #956