Using `None` as a positional tensor input to a layer is not supported.

ianstenbit commented 1 year ago

In tf.keras, if a layer accepts a list of Tensors as an input, it is valid behavior for one of the items in that list to be None. In Keras Core, that is not the case.

Here's a minimal example of such a layer:

from tensorflow import keras

class MySillyLayer(keras.layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.dense_1 = keras.layers.Dense(10)
        self.dense_2 = keras.layers.Dense(10)

    def call(self, inputs):
        key, query = inputs
        if query is None:
            query = key

        key = self.dense_1(key)
        query = self.dense_2(query)

        return key + query

It can then be called like this (at least in tf.keras)

import tensorflow as tf
x = tf.ones((1, 20))
y = tf.ones((1, 20))

z = MySillyLayer()([x, y]) # Valid!
z = MySillyLayer()([x, None]) # Also valid! (but only in tf.keras)

If you run the above code with tf.keras, it works fine. If you run it instead with Keras Core, you get this error:

ValueError: Only input tensors may be passed as positional arguments. The following argument value should be passed as a keyword argument: None (of type <class 'NoneType'>)

A more realistic example of why one would use this would be an attention layer (the KerasCV StableDiffusion CrossAttention layer uses this feature of tf.keras, which is how I encountered this issue).

Do we want to support this / is it feasible to do so? If not, it should go on our list of incompatibilities.

fchollet commented 1 year ago

Thanks for the report. As it turns out it's pretty deliberate, later down we have this separate error:

raise ValueError(
    "In a nested call() argument, "
    "you cannot mix tensors and non-tensors. "
    "Received invalid mixed argument: "
    f"{name}={value}"
)

This limitation exists because allowing None would lead to undefined entries in the shape tuple structs being passed around.

A more realistic example of why one would use this would be an attention layer (the KerasCV StableDiffusion CrossAttention layer uses this feature of tf.keras, which is how I encountered this issue)

Typically separate tensor inputs should be separate arguments. Maybe we can refactor the layer in this way?

ianstenbit commented 1 year ago

Gotcha -- thanks for the info!

Typically separate tensor inputs should be separate arguments. Maybe we can refactor the layer in this way?

That's exactly what we did in https://github.com/keras-team/keras-cv/pull/1982, which is where this came up.

Hopefully this issue can serve as a reference for people who encounter this in the future. If so, take a look at how we handled it here

keras-team / keras-core

Using `None` as a positional tensor input to a layer is not supported. #737