Discovered a magic bug in keras, which is caused by incoming data

pass-lin commented 1 month ago

Here is an example code, where bert4keras3 is my own llm library, which can be installed through pip

import json
config = {
    "type_vocab_size": 2,
    "use_bias": 0,
    "o_bias": 0,
    "vocab_size": 64000,
    "num_hidden_layers": 1,
    "query_head": 32,
    "num_attention_heads": 4,
    "hidden_size": 4096,
    "intermediate_size": 11008,
    "attention_head_size": 128,
    "dropout_rate": 0.0,
    "hidden_act": "silu",
    "max_wavelength": 5000000.0
}

with open('config.json', 'w') as f:
        json.dump(config, f, indent=4, ensure_ascii=False)
import os
os.environ["KERAS_BACKEND"] = "torch"
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
maxlen= 16
import tensorflow as tf
import tensorflow.experimental.numpy as tnp
import keras
from bert4keras3.models import build_transformer_model
keras.config.set_dtype_policy("bfloat16")
model=build_transformer_model(
    'config.json',
    model='llama',
    sequence_length=maxlen,
)
tokens = tnp.random.randint(100,60000,size=[3,maxlen],dtype='int32')
segments = tnp.ones([3,maxlen],dtype='int32')
print(model.inputs)
#this can successfully running
print(model([tokens,segments]))
#this can successfully running
try:
    print(model({'Input-Token':segments ,
        'Input-Segment':tokens }))
except:
    print('error')
#this can not successfully running
try:
    model({'Input-Token':tokens ,
        'Input-Segment':segments })
except:
    print('error')

The reason for the failure can be discovered in the call function of the bert4keras.Layers.Add.Embedding class. If you print inputs, self.name, you will find that the layer where 'Input-Token' should be passed has received 'Input-Segment' instead. However, if you print self.name, inputs.name during the execution of the build_transformer_model function, you can see that they are matched correctly.

Further, if we do not use a dictionary as input and instead pass a list in the order of model.inputs, the code will run smoothly at this point. Additionally, this bug only exists in Keras 3.4.1 and not in Keras 3.3.3.

pass-lin commented 1 month ago

Especially with the last two inference functions, you can find that the behavior of Keras 3.4.1 is opposite to that of Keras 3.3.3.

mehtamansi29 commented 1 month ago

Hi @pass-lin-

Thanks for reporting this issue. I am also reproducing this issue even though tokens are in range of vocab size. And embedding layer expects input_dim and inputs are expected to be in the range [0, input_dim), which is correct in the code.

Attached gist for the reference.

We look into this more and update you.

pass-lin commented 1 month ago

Hi @pass-lin-

Hi @pass-lin-

Thanks for reporting this issue. I am also reproducing this issue even though tokens are in range of vocab size. And embedding layer expects input_dim and inputs are expected to be in the range [0, input_dim), which is correct in the code.

感谢您报告此问题。即使标记在词汇大小范围内，我也在复制此问题。嵌入层期望input_dim并且输入期望在[0，input_dim）范围内，这在代码中是正确的。

gist

附上要点供参考。

We look into this more and update you.

我们对此进行了更多调查并向您更新。

this bug also be found at jax backend

hertschuh commented 1 month ago

Hi @pass-lin ,

The issue is that you're supposed to pass the inputs the same way as you created the functional model. In your case, it appears that the inputs were passed as an array when creating the model, so you need to pass them as an array when calling the model. If you want to pass them as a dict, you need to create the model using a dict.

For instance, given this:

x1 = keras.Input((10,), name='Input-Token')
x2 = keras.Input((20,), name='Input-Segment')
y = keras.layers.Concatenate()([x1, x2])

You can use arrays:

model = keras.Model(inputs=[x1, x2], outputs=y)
model([keras.ops.ones((1, 10)), keras.ops.zeros((1, 20))])

Or you can use dicts:

model = keras.Model(inputs={'Input-Token':x1, 'Input-Segment':x2}, outputs=y)
model({'Input-Token':keras.ops.ones((1, 10)), 'Input-Segment':keras.ops.zeros((1, 20))})

But you cannot mix them.

The real bug is that it's not complaining about the fact that the structure passed as input to __call__ doesn't match what was configured at model creation.

The reason why it swaps them is that it's ordering the dict by order of the keys to get an array.

The behavior did change between 3.3.3 and 3.4.1 as a result of adding support for optional inputs.

pass-lin commented 1 month ago

@pass-lin

嗨@pass-lin，

The issue is that you're supposed to pass the inputs the same way as you created the functional model. In your case, it appears that the inputs were passed as an array when creating the model, so you need to pass them as an array when calling the model. If you want to pass them as a dict, you need to create the model using a dict.

问题是您应该以与创建函数模型相同的方式传递输入。在您的例子中，在创建模型时，输入似乎是作为数组传递的，因此，在调用模型时，您需要将它们作为数组传递。如果您想将它们作为判决传递，则需要使用判决创建模型。

For instance, given this:

例如，给定这个：
x1 = keras.Input((10,), name='Input-Token')
x2 = keras.Input((20,), name='Input-Segment')
y = keras.layers.Concatenate()([x1, x2])
You can use arrays:

您可以使用数组：
model = keras.Model(inputs=[x1, x2], outputs=y)
model([keras.ops.ones((1, 10)), keras.ops.zeros((1, 20))])
Or you can use dicts:

或者您可以使用字典：
model = keras.Model(inputs={'Input-Token':x1, 'Input-Segment':x2}, outputs=y)
model({'Input-Token':keras.ops.ones((1, 10)), 'Input-Segment':keras.ops.zeros((1, 20))})
But you cannot mix them.

但你不能混合它们。

__call__

真正的bug是它没有抱怨作为输入传递给__call__的结构与模型创建时配置的不匹配。

The reason why it swaps them is that it's ordering the dict by order of the keys to get an array.

它交换它们的原因是它按键的顺序排序以获取数组。

The behavior did change between 3.3.3 and 3.4.1 as a result of adding support for optional inputs.

由于添加了对可选输入的支持，3.3.3和3.4.1之间的行为确实发生了变化。

So will this be fixed in future versions of Keras, or should I fix it in my local code? The code runs normally in Keras versions 2.3.1 to 3.3.3, but it does not run in Keras 3.4.1, which is a bit unbelievable.

hertschuh commented 1 month ago

So will this be fixed in future versions of Keras, or should I fix it in my local code? The code runs normally in Keras versions 2.3.1 to 3.3.3, but it does not run in Keras 3.4.1, which is a bit unbelievable.

You will need to fix this by either:

only calling the model with arrays
creating the model with a dict

What will be fixed in Keras is that it will raise a different error instead of taking incorrect inputs.

pass-lin commented 1 month ago

So will this be fixed in future versions of Keras, or should I fix it in my local code? The code runs normally in Keras versions 2.3.1 to 3.3.3, but it does not run in Keras 3.4.1, which is a bit unbelievable.

You will need to fix this by either:

only calling the model with arrays

creating the model with a dict

What will be fixed in Keras is that it will raise a different error instead of taking incorrect inputs.

However, from Keras 2.3.1 to Keras 3.3.3, this behavior is correct and can be executed normally. It's really hard to accept. Especially when I'm using tf.data to read tfrecord, I can only choose dict as the model input, but most of the time I use list to create the Model.

keras-team / keras

Discovered a magic bug in keras, which is caused by incoming data #20070