philipperemy / cond_rnn

Conditional RNNs for Tensorflow / Keras.
MIT License
222 stars 33 forks source link

Bidirectional Layer with Functional API #35

Closed ykocoglu closed 1 year ago

ykocoglu commented 1 year ago

Hi Philippe Remy,

I have been trying to run ConditionalRecurrent wrapper on the Bidirectional Layer with the Functional API to be able to stack layers with no success yet. It ran successfully for layers such as LSTM, GRU but, not Bidirectional. The code looks something like below:

import numpy as np
from tensorflow.keras.layers import Dense, GRU, LSTM, Bidirectional
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import Input, Model

from cond_rnn import ConditionalRecurrent

patience = 50

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                      patience=patience,
                                                      mode='min')
optimizer = tf.keras.optimizers.Adam(
    learning_rate=0.00001,
    beta_1=0.95,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=False)

# Weight initialization
initializer = tf.keras.initializers.Orthogonal()

# Functional API
i = Input(shape=[6, 3], name='input_0')
c = Input(shape=[3], name='input_1')

# Bi-directional layers
forward_layer = ConditionalRecurrent(GRU(units=30, return_sequences=True))
backward_layer = ConditionalRecurrent(GRU(units=30, return_sequences=True, go_backwards=True))

x = Bidirectional(layer=forward_layer,backward_layer=backward_layer, name='cond_rnn_1')([i,c])
x = Dense(units=2, activation='linear')(x)
model = Model(inputs=[i, c], outputs=[x])

model.compile(optimizer=optimizer, loss='mse', metrics = ['mae','mape'])
history = model.fit(w1.train, epochs=40000, validation_data=w1.val, verbose=2,callbacks=[early_stopping])

The error I'm getting is:


ValueError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_100012\3844763617.py in 30 backward_layer = ConditionalRecurrent(GRU(units=30, return_sequences=True, go_backwards=True)) 31 ---> 32 x = Bidirectional(layer=forward_layer,backward_layer=backward_layer, name='cond_rnn_1')([i,c]) 33 x = Dense(units=2, activation='linear')(x) 34 model = Model(inputs=[i, c], outputs=[x])

~\AppData\Local\Continuum\anaconda3\envs\Tensorflow\lib\site-packages\keras\layers\wrappers.py in call(self, inputs, initial_state, constants, **kwargs) 598 if num_states % 2 > 0: 599 raise ValueError( --> 600 'When passing initial_state to a Bidirectional RNN, ' 601 'the state should be a list containing the states of ' 602 'the underlying RNNs. '

ValueError: When passing initial_state to a Bidirectional RNN, the state should be a list containing the states of the underlying RNNs. Received: [<KerasTensor: shape=(None, 3) dtype=float32 (created by layer 'input_1')>]

My guess is that the Bidirectional RNN itself does not like ([i,c]) states that I'm trying to pass it but, I'm not sure if I'm correct here.

Another question I have in mind is: If I were to add an Encoder-Decoder architecture, can I still use ConditionalRecurrent or would that also have similar issues as the Bidirectional Layer. I haven't tried this yet but, it is something I have in mind that I want to try.

Thank you.

philipperemy commented 1 year ago

@ykocoglu yes it's a known issue. Somebody has to update the lib to work with the functional API for Bidirectional

https://github.com/philipperemy/cond_rnn/blob/master/examples/bidirect.py

It will work nicely with the Sequential API.

philipperemy commented 1 year ago

If I were to add an Encoder-Decoder architecture, can I still use ConditionalRecurrent or would that also have similar issues as the Bidirectional Layer. I haven't tried this yet but, it is something I have in mind that I want to try.

I guess it can work if you use the Sequential API. I haven't tried it myself for this case so that's a good question!

ykocoglu commented 1 year ago

@philipperemy Thank you for your response. The only reason I was a little worried about it was the fact that I thought I won't be able to stack layers with the Sequential API but, based on some example scripts I found over the web, stacking layers with Sequential API should be possible right? I haven't tried it to confirm it yet myself (I will try it soon).

I'm just trying to understand where I might face issues (if any) with the Sequential API. Is it saving the model or is there more to it than just that? I know that this is not a direct Conditional RNN question but, I would really appreciate your thoughts in this matter.

Thank you again.

philipperemy commented 1 year ago

Yes it should be possible. Let me know how it goes!

Regarding the functional API, from what I remember, the issue comes from how the inputs are fed: https://github.com/philipperemy/cond_rnn/blob/master/cond_rnn/cond_rnn.py#L64. We expect a list for the inputs instead of one single tensor.

So if there is any issue down the line, I guess it should be related to that one.

ykocoglu commented 1 year ago

Thank you @philipperemy. I'm not sure how fast I can get to it but, as soon as I find out, I'll update you.

Thank you again.

philipperemy commented 1 year ago

Hello it seems that I got it to work: https://github.com/philipperemy/cond_rnn/blob/master/examples/bidirect_functional.py

ykocoglu commented 1 year ago

This is great @philipperemy.

I actually have been pre-occupied with other work and could not get to test it further. I would have eventually got to the point where I need to test it and to be honest I was thinking that functional API with Cond Bi-directional was not going to work due to the errors I was receiving. I'm glad that you proved me wrong with your example. The example you are showing is exactly what I need.

I'm sincerely grateful. Thank you for resolving this issue.

philipperemy commented 1 year ago

@ykocoglu you're welcome!