keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.89k stars 19.45k forks source link

"No gradients provided for any variable." when variable uses an integer data type #20058

Closed solidDoWant closed 1 month ago

solidDoWant commented 2 months ago

When using an integer data type for a trainable variable, training will always throw a "No gradients provided for any variable." ValueError. Here is a very simple example to reproduce the issue:

import keras
import tensorflow as tf
import numpy as np

variable_dtype = tf.int32
# variable_dtype = tf.float32   # Uncommenting this fixes the issue

class BugTestLayer(keras.layers.Layer):
    # Layer is just y = self.var * x
    def build(self, input_shape):
        self.var = self.add_variable(
            (1,), initializer="zeros", dtype=variable_dtype)

    def call(self, input):
        return input * self.var

input_layer = keras.Input((1,), dtype=tf.int32)
test_layer = BugTestLayer()

output = test_layer(input_layer)
model = keras.Model(inputs=[input_layer], outputs=[output])

values = np.array([[i] for i in range(1000)])

model.compile(
    loss=[keras.losses.MeanSquaredError()],
    optimizer=keras.optimizers.RMSprop(),
    metrics=[keras.metrics.MeanSquaredError()],
)

# This will always raise a `ValueError: No gradients provided for any variable.`
# when using an integer type
history = model.fit(values, values, batch_size=1, epochs=2)

Unfortunately this error message is vague enough that the root of the issue is unclear. The full error message is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /workspaces/a8d3c9dff26b642ae4afaf1584a676512f9b8e8ce73bdaa449ed5ed373627eb7/test_bug.py:14
      3 values = np.array([
      4     [i]
      5     for i in range(1000)
      6 ])
      8 model.compile(
      9     loss=[keras.losses.MeanSquaredError()],
     10     optimizer=keras.optimizers.RMSprop(),
     11     metrics=[keras.metrics.MeanSquaredError()],
     12 )
---> 14 history = model.fit(values, values, batch_size=1, epochs=2)

File ~/.local/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    119     filtered_tb = _process_traceback_frames(e.__traceback__)
    120     # To get the full stack trace, call:
    121     # `keras.config.disable_traceback_filtering()`
--> 122     raise e.with_traceback(filtered_tb) from None
    123 finally:
    124     del filtered_tb

File ~/.local/lib/python3.12/site-packages/keras/src/optimizers/base_optimizer.py:662, in BaseOptimizer._filter_empty_gradients(self, grads, vars)
    659         missing_grad_vars.append(v.name)
    661 if not filtered_grads:
--> 662     raise ValueError("No gradients provided for any variable.")
    663 if missing_grad_vars:
    664     warnings.warn(
    665         "Gradients do not exist for variables "
    666         f"{list(reversed(missing_grad_vars))} when minimizing the loss."
    667         " If using `model.compile()`, did you forget to provide a "
    668         "`loss` argument?"
    669     )

ValueError: No gradients provided for any variable.

If it helps, here is what the model looks like: image

Full disclosure: I'm not certain if this is a Keras bug or a Tensorflow bug. If this is suspected to be a Tensorflow bug, let me know and I'll file an issue there instead.

ghsanti commented 2 months ago

Happens in this minimal example (no keras):

import tensorflow as tf
x = tf.Variable(3, dtype=tf.int32)
y = tf.multiply(x,2)
with tf.GradientTape() as g:
  g.watch(x)
  dy_dx = g.gradient(y, x)
  print(dy_dx)

WARNING:tensorflow:The dtype of the watched tensor must be floating (e.g. tf.float32), got tf.int32 None

TF docs source.

Tensorflow needs floats to compute a gradient. So it's not a bug, weights have to be floats.

mehtamansi29 commented 2 months ago

Hi @solidDoWant -

Thanks for raising this issue. Actually Integer and strings are not differentiable. So there will be no gradients if using those integer or string data type.

For gradients are calculated using floating point algorithm only. TensorFlow doesn't automatically cast between types, because of that you are getting error.

More details regarding gradient with different data type can find here.

solidDoWant commented 2 months ago

Thanks for looking into this ghsanti and mehtamansi29.

Actually Integer and strings are not differentiable. So there will be no gradients if using those integer or string data type.

I thought this came up. Strictly speaking floats are not differentiable either under the same reasoning. Both integers and floats are stored as limited-precision numbers and therefore are discrete values instead of continuous. If integers are not differentiable then floats are not either, and if floats are differentiable then integers are as well. In both cases there is just a large loss of precision for numbers close to zero - it's just a question of what "close" is.

I do get that strings are not differentiable thoug - I don't know mathematically how you would represent the derivative of a string.

Even if gradients for integers isn't implemented (I still think this is a bug but putting that aside), I really think that the error message should relay something more friendly about where the problem is.

Happens in this minimal example (no keras):

I think this is enough to show that my issue is with Tensorflow rather than keras. I'll file an issue there unless there is any reason to believe the issue is with keras.

solidDoWant commented 2 months ago

One last thing - could the docs be updated to mention this limitation? I got stuck on this for quite awhile while, and it might be nice for future users to be aware of this beforehand.

ghsanti commented 2 months ago

Thinking about the weights' update formula for a simple linear model using MSE:

We want the lr to be small_ (i.e a float) for optimisation to succeed, plus we have a fraction. So we would need to round that to an integer, for the weights to be updated correctly (using same types.)

On PyTorch (just the forum), they say:

(...) optimizer wants to change one of your network parameters by a small amount

See post.

It'd be good if you link the TF Issue once you do it to check what they reply anyways.

Very interesting issue! @solidDoWant

mehtamansi29 commented 2 months ago

I thought this came up. Strictly speaking floats are not differentiable either under the same reasoning. Both integers and floats are stored as limited-precision numbers and therefore are discrete values instead of continuous. If integers are not differentiable then floats are not either, and if floats are differentiable then integers are as well. In both cases there is just a large loss of precision for numbers close to zero - it's just a question of what "close" is.

Hi @solidDoWant

Integers have limited range. If programs uses integer than it will results of integer computations can be stored in 32 bits. Here most calculation in real numbers so it will produce more quantities. Floating point can be represent wide range of numbers and also have capacity of rounding error as well. While both floats and integers are discrete, floats offer significantly higher precision in representing fractional values. This is essential for calculating gradient. And while doing GPU and TPU computing floating point computation are required because floating point precision we can use via quantization and memory optimization. Here you can find more detail about mathematics behind floating point precision.

solidDoWant commented 2 months ago

Busy couple of weeks... just filed an issue with Tensorflow repo here

I also discovered that even with a custom gradient function, this still fails on int32 without even calling the function

google-ml-butler[bot] commented 1 month ago

Are you satisfied with the resolution of your issue? Yes No

mehtamansi29 commented 1 month ago

Hi @solidDoWant -

Closing the issue as you created new issue in tensorflow repo. Feel free to reopen the issue if required. Thanks...!!