keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.15k stars 19.49k forks source link

backend argmax has none for gradients. Can you even define one? #11157

Closed lcukerd closed 6 years ago

lcukerd commented 6 years ago

I am using Keras.Backend.armax() in a gamma layer. The model compiles fine but throws an error during fit().

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My model:

latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)

encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')

decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()

Model summary for easy visualizing:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 32)                0         
_________________________________________________________________
embedding_13 (Embedding)     (None, 32, 512)           2018816   
_________________________________________________________________
lstm_19 (LSTM)               (None, 32, 512)           2099200   
_________________________________________________________________
dropout_10 (Dropout)         (None, 32, 512)           0         
_________________________________________________________________
dense_19 (Dense)             (None, 32, 3943)          2022759   
_________________________________________________________________
lambda_5 (Lambda)            (None, 32)                0         
_________________________________________________________________
lambda_6 (Lambda)            (None, 32)                0         
_________________________________________________________________
dense_20 (Dense)             (None, 501)               16533     
_________________________________________________________________
embedding_14 (Embedding)     (None, 501, 512)          2018816   
_________________________________________________________________
lstm_20 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
lstm_21 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
dropout_11 (Dropout)         (None, 501, 512)          0         
_________________________________________________________________
dense_21 (Dense)             (None, 501, 3943)         2022759   
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________

I googled for the solution but almost all were about a faulty model. Some recommended to not use functions causing that are causing issues. However, as you can see, I cannot create this model without K.argmax (If you know any other way then do tell me).

Also, how can you even define gradient of argmax! I am guessing its an issue in Keras, if not, pls tell me how to define its gradient.

gabrieldemarmiesse commented 6 years ago

Hello! The argmax function has no gradient. Or at least, its gradient is equal to zero all the time. This is not specific to keras. It's the same in all deep learning frameworks because this is the mathematical definition of the gradient of argmax.

If you wish to create your own operation, with a custom gradient, you need to access the backend directly and create a new op. But most of the time, it's not a walk in the park. See https://www.tensorflow.org/extend/adding_an_op

lcukerd commented 6 years ago

Yes, I know argmax has no gradient. But the error is clearly asking me to define one for argmax. How do I get to fix this error then?

gabrieldemarmiesse commented 6 years ago

The error message is maybe not clear. It's saying that you should only use backend functions which have a gradient. So something else than argmax. The message is not saying that you should define argmax's gradient. Maybe this message is not explicit enough.

lcukerd commented 6 years ago

Okay. So is there any alternative for argmax (as my model cannot work without one) that I can use?

Btw, why does backend have argmax function when we can't use it in model?

gabrieldemarmiesse commented 6 years ago

I don't know any alternative for argmax, I've never worked with a model requiring one.

Argmax is there to perform operations whenever the gradient is not needed. For example, when computing a metric.

I suppose you can try to use the argmax from tensorflow directly and see if you get the error. But you must know what you are doing because if there is no error, it is implied that the gradient is null (like tf.around)

lcukerd commented 6 years ago

Okay, thanks for helping me out. I will give tensorflow a go.

I will let this issue open for a day and wait for someone who knows an alternative to argmax. I hope no one has problem with this (else they can close it).

MansiAgarwal11 commented 6 years ago

Did you find any solution to this? @lcukerd

lcukerd commented 6 years ago

@MansiAgarwal11 Yes, I did. You will have to use Keras in Tensorflow model. For training, you will have to define a loss function like in this article. In the model shown in the article if you include argmax, it will still work. You should be able to do this using only Keras but I haven't tried yet.

MansiAgarwal11 commented 6 years ago

But if there is no gradient for argmax function, how does the model backpropagate?

lcukerd commented 6 years ago

I am not sure myself but I think the tensorflow code was written to bypass it in a clever way. Probably someone tensorflow team can clear this up? Btw Did your model converge?

MansiAgarwal11 commented 6 years ago

I didn't make use of argmax and came up with a different loss function for my problem.

mycal-tucker commented 6 years ago

FYI, in my experience with a different tensorflow function that didn't have a gradient, I found that I could run and train the model without any errors, but because there was no gradient, there was no actual learning taking place. It's something to look out for if you try to use argmax.

sunwei317 commented 5 years ago

I have the same problem. There is no any problem for train and evaluation, and Ok for saving the model in H5. However, when loading the saved model, the error message pops up: ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. Do you have idea to fix this issue. Otherwise, the model cannot be used for prediction. Thank you.

mycal-tucker commented 5 years ago

^ You're saying that you can train a model successfully with argmax? That surprises me. What I was trying to say in my earlier comment is that you can sometime run the training with arguments that don't have a gradient and no errors will be thrown, but your model won't actually get better.

How confident are you that the model you're training is actually getting better as you train it?

sunwei317 commented 5 years ago

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

mycal-tucker commented 5 years ago

Well, thanks for the update, but you've stumped me. I don't understand: 1) How you're training the model with argmax in the cost function and 2) How to solve the question you're actually asking about loading the weights again.

Sorry I couldn't be more help.

e4exp commented 5 years ago

Gumbel-softmax may solve the problem of argmax. http://anotherdatum.com/gumbel-gan.html And this states another way to solve that(I got an error currently). https://stackoverflow.com/questions/46926809/getting-around-tf-argmax-which-is-not-differentiable

mycal-tucker commented 5 years ago

Yeah, or SeqGAN-based idea of policy updates: https://arxiv.org/abs/1609.05473

chikubee commented 5 years ago

I faced the same problem with GPU. with runtime as None, it seems the problem no longer persists.

fezancs commented 5 years ago

I'm facing the same issued , I define new layer in Lambda

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. Can any body help into the matter

yli192 commented 4 years ago

I monitored the precision, recall and accuracy while training, the model was getting better. If the model was saved with Keras.save, then the error above appears with Keras.load_model. However, if the model was saved with Keras.mode_to_json and Keras.save_weights, then everything is fine when loading the saved model.

I implemented this solution and it worked for me. This is all you will need:

Save model to JSON

model_json = model.to_json() with open("model.json", "w") as json_file: json_file.write(model_json)

serialize weights to HDF5

model.save_weights("model.h5") print("Saved model to disk")

Load JSON model

json_file = open('model.json', 'r') loaded_model_json = json_file.read() json_file.close() loaded_model = model_from_json(loaded_model_json)

load weights into new model

loaded_model.load_weights("model.h5") print("Loaded model from disk")