keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
60 stars 28 forks source link

Keras generates non-unique names for stateful LSTM's states #283

Open mergian opened 1 year ago

mergian commented 1 year ago

System information.

Describe the problem. A stateful LSTM has 2 state values, both with the identical name. As you can see in the reproducing example, we have two different variables, but both have the identical name.

Describe the current behavior. LSTM states have the identical name.

Describe the expected behavior. LSTM states should have unique names.

Contributing.

I think the problem comes from here: https://github.com/keras-team/keras/blob/master/keras/layers/rnn/base_rnn.py#L909

  1. I would do something like lambda x: backend.variable(x, name='state'). This will change the name to lstm/state:0 which is already more readable.
  2. To me it seems it's not possible to give tf.Variable a unique id, so I assume the naming would need to be done manually with something like lambda x: backend.varaible(x, name=f'state_{unique_id}'.

Standalone code to reproduce the issue.

import tensorflow as tf
import tensorflow.keras as keras
import numpy as np

inp = keras.Input(batch_shape=[1, 2, 3])
rnn = keras.layers.LSTM(stateful=True, units=3)
model = keras.Model(inp, rnn(inp))

print('Before:')
print(*rnn.states, sep='\n')
print()

data = np.random.rand(1, 2, 3).astype(np.float32)
model(data)

print('After:')
print(*rnn.states, sep='\n')

Output:

Before:
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[0., 0., 0.]], dtype=float32)>
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[0., 0., 0.]], dtype=float32)>

After:
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[ 0.11526701, -0.02300704, -0.1668661 ]], dtype=float32)>
<tf.Variable 'lstm/Variable:0' shape=(1, 3) dtype=float32, numpy=array([[ 0.17723909, -0.06333476, -0.31777197]], dtype=float32)>
sushreebarsa commented 1 year ago

@mergian Sorry for the late response! I was able to replicate this issue on colab, please find the gist here. @SuryanarayanaY Could you please take a look at this issue? Thank you!

SuryanarayanaY commented 1 year ago

@mergian ,

I have checked the Source code of LSTM and observed that the all LSTM layer attributes except states not defined as @property. Not sure though the exact reason for this and how it may affect this case. If you have time please have a look and see whether it provides any insight. I just checked implementing this makes any difference and found no difference. Escalating the issue for further investigation. Thanks!

sampathweb commented 1 year ago

Thanks for the issue. Would you be able to contribute with a PR?

SuryanarayanaY commented 1 year ago

@mergian , Thanks for the PR.

SuryanarayanaY commented 1 year ago

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please read https://keras.io/keras_core/announcement/

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

SuryanarayanaY commented 1 year ago

@mergian,

Sorry for the inconvenience. Because of migration process we have moved the issue from keras-team/keras to keras-team/tf-kerasand then keras-core to keras as keras which is now Keras3, going to be multi backend. As Team is quite busy in this process of migration activity your PR review might got delayed.

Since we can't transfer the PR from one repo to other we have to close all the PRs. I am reopening the issue and request you to raise the fresh PR here in this repo if its related to TF backend or if you believe it remains relevant to the Keras 3 code base you can raise a PR at keras-team/keras repo.

mergian commented 1 year ago

Dear @SuryanarayanaY

thanks for the explanation. I fully understand the technical reasoning for closing the old PRs. However, please also understand that I neither have the time nor the motivation to redo all my pull requests (I think there are 4 open?) that have been waiting for more than 6 months in which Keras team didn't bother to review them.

Please feel free to run the reproducing scripts and check if the error still occurs yourself. You can also use the fixes I proposed in my pull requests if you like. However, I will not spend any more time on creating pull requests which Keras team asked for, which then get ignored for more than half a year and in the end being closed.

Sorry but my time is to valuable for this.

Best

SuryanarayanaY commented 12 months ago

I have tested the code with keras_core and tf-nightly. This issue not relevant to keras_core as the variable names are not there in keras. Attached keras_core gist for reference.

However in tf-nightly this is still same behaviour as reported in the issue.Attached gist for tf-nightly .