Open ageron opened 1 week ago
Thanks for the report. Do you have a Colab or code snippet to reproduce the error?
Best I can tell the docs look outdated. The two corrections you made are accurate.
Thanks @fchollet , here's a little code snippet to reproduce the error:
import keras
import numpy as np
model = keras.Sequential([
keras.layers.Input(batch_shape=[1, 10, 3]),
keras.layers.LSTM(10, return_sequences=True, stateful=True),
keras.layers.LSTM(10, return_sequences=True, stateful=True),
keras.layers.Dense(5)
])
model.compile(loss="mse", optimizer="sgd")
X_train = np.random.rand(100, 10, 3)
y_train = np.random.rand(100, 10, 5)
model.fit(X_train, y_train, epochs=1)
I'm getting the same exception as above.
Here's a gist notebook with the code above.
Thanks for the code. The origin of the issue is a discrepancy between the batch size specified in Input
and the batch size effectively received by the model (if you pass raw numpy data to fit()
, it gets chunked into batches, configured by the batch_size
argument).
You can just pass batch_size=1
in fit()
to fix it (or otherwise use a generator-like or tf.data.Dataset-like data source)
We should have a check somewhere to prevent against such a mismatch.
Ah got it, thanks François. 👍
Indeed, the following code works fine:
model = keras.Sequential([
keras.layers.Input(batch_shape=[1, 10, 3]),
keras.layers.LSTM(10, return_sequences=True, stateful=True),
keras.layers.LSTM(10, return_sequences=True, stateful=True),
keras.layers.Dense(5)
])
model.compile(loss="mse", optimizer="sgd")
X_train = np.random.rand(100, 10, 3)
y_train = np.random.rand(100, 10, 5)
model.fit(X_train, y_train, epochs=1, batch_size=1)
So it's just a documentation issue, I'll update the name of this issue.
Thanks, I'll send a doc update. @fchollet where would we implement this check?
The documentation for the base
RNN
layer contains the following explanation, which is outdated:In particular:
batch_input_shape
argument no longer exists, I believe we should instead add anInput
layer and specify itsbatch_size
argument, if I'm not mistaken.Sequential
class no longer has areset_states()
method. Instead, I suppose we must iterate through the layers and if they have areset_state()
method (nos
at the end), then we call it.However, I was unable to build a stateful RNN, I'm getting the following exception:
I'm not sure whether this is a bug or whether I'm not implementing a stateful RNN correctly using Keras 3. If someone can please explain how to build one, I'm happy to update the documentation.