Simple LSTM - Githubissues

Simo-JDev commented 1 year ago

Hello, since you have been so helpful so far I thought I'd try again 😄

I have this simple LSTM, where the inputs are the time series, of shape (None, 10, 500), i.e. 10 time-steps and 500 features, and the condition is a single value per input so (None, 1). My first attempt was designing the network without specifying the input shape and plugging in the data at training:

n_features = 500
model = Sequential()
model.add(ConditionalRecurrent(LSTM(128)))
model.add(Dense(n_features))
model.add(Dense(n_features))
model.compile(loss='mae', optimizer='adam')

# Train the model
history = model.fit(x = [train_x, train_c], y = train_y,
                validation_data=([test_x, test_c], test_y),
                epochs=1,
                batch_size=int(train_x.shape[0]/2),
                verbose=1, shuffle=True)

Obviously I get the error:

WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. 
Received: inputs=(<tf.Tensor 'IteratorGetNext:0' shape=(3430, 10, 500) dtype=float32>,
 <tf.Tensor 'IteratorGetNext:1' shape=(3430, 1) dtype=float32>).
 Consider rewriting this model with the Functional API.

but the results are surprisingly good, better than before introducing the cond_rnn.

Trying to solve the above issue, I have tried following another one of your examples and ended up with this:

i = Input(shape=[10,500], name='input_0')
c = Input(shape=[1], name='input_1')

x = ConditionalRecurrent(LSTM(128, return_sequences=True, name='cond_rnn_0'))([i,c])

x = ConditionalRecurrent(LSTM(128, return_sequences=False, name='cond_rnn_1'))([x,c])

x = Dense(units=1, activation='softmax')(x)

model = Model(inputs=[i,c], outputs=[x])

model.compile(optimizer='adam',loss='mae')

In this case I get no error regarding the inputs. but the results are completely useless, outputting always zero for every feature.

Any idea on what is wrong in my code? I have followed your examples as much as possible while trying to apply it to my case, so maybe I have messed up something in the process.

EDIT: Solved part of the issue myself, so I removed that part of the question

philipperemy commented 1 year ago

Hello,

The code I've pasted below works and learns nicely.

The training task is straightforward. I'm feeding the conditions to the input X and to the target Y.

It's a good way to see if the condition is used in the model.

I also give a random Y (an impossible task to learn) and it's clear that the model is not learning when Y is purely random.

I guess this example should help you.

PS: look at my message below. It's closer to what you want to achieve.

import numpy as np
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import LSTM, Bidirectional, Dense

from cond_rnn import ConditionalRecurrent

forward_layer = ConditionalRecurrent(LSTM(units=12, return_sequences=False))
backward_layer = ConditionalRecurrent(LSTM(units=13, return_sequences=False, go_backwards=True))

NUM_SAMPLES = 100
TIME_STEPS = 10
INPUT_DIM = 3
NUM_CLASSES = 2

inputs = (
    Input(shape=(TIME_STEPS, INPUT_DIM)),
    Input(shape=(NUM_CLASSES,))  # conditions.
)

x = Bidirectional(
    layer=forward_layer,
    backward_layer=backward_layer,
)(inputs=inputs)
output = Dense(NUM_CLASSES)(x)

model = Model(inputs=inputs, outputs=output)
model.compile(loss='categorical_crossentropy')

model.summary()

train_inputs = np.random.uniform(size=(NUM_SAMPLES, TIME_STEPS, INPUT_DIM))
train_targets = np.zeros(shape=[NUM_SAMPLES, NUM_CLASSES])
assert model.predict(x=[train_inputs, train_targets]).shape == (NUM_SAMPLES, NUM_CLASSES)

x = [train_inputs, train_targets]
# will not learn.
model.fit(x=x, y=np.random.uniform(size=train_targets.shape), epochs=10)

# will learn.
model.fit(x=x, y=train_targets, epochs=10)

philipperemy commented 1 year ago

Sorry the previous snippet was with Bidirectional. Use this one. It's closer to what you want to achieve.

import numpy as np
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import LSTM, Dense

from cond_rnn import ConditionalRecurrent

NUM_SAMPLES = 100
TIME_STEPS = 10
INPUT_DIM = 3
NUM_CLASSES = 2

inputs = (
    Input(shape=(TIME_STEPS, INPUT_DIM)),
    Input(shape=(NUM_CLASSES,))  # conditions.
)

x = ConditionalRecurrent(LSTM(units=12, return_sequences=True))(inputs)
x = ConditionalRecurrent(LSTM(units=13, return_sequences=False))([x, inputs[1]])
output = Dense(NUM_CLASSES)(x)

model = Model(inputs=inputs, outputs=output)
model.compile(loss='categorical_crossentropy')

model.summary()

train_inputs = np.random.uniform(size=(NUM_SAMPLES, TIME_STEPS, INPUT_DIM))
train_targets = np.zeros(shape=[NUM_SAMPLES, NUM_CLASSES])
assert model.predict(x=[train_inputs, train_targets]).shape == (NUM_SAMPLES, NUM_CLASSES)

x = [train_inputs, train_targets]
# will not learn.
model.fit(x=x, y=np.random.uniform(size=train_targets.shape), epochs=10)

# will learn.
model.fit(x=x, y=train_targets, epochs=10)

Output

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 10, 3)]      0           []                               

 input_2 (InputLayer)           [(None, 2)]          0           []                               

 conditional_recurrent (Conditi  (None, 10, 12)      804         ['input_1[0][0]',                
 onalRecurrent)                                                   'input_2[0][0]']                

 conditional_recurrent_1 (Condi  (None, 13)          1391        ['conditional_recurrent[0][0]',  
 tionalRecurrent)                                                 'input_2[0][0]']                

 dense (Dense)                  (None, 2)            28          ['conditional_recurrent_1[0][0]']

==================================================================================================
Total params: 2,223
Trainable params: 2,223
Non-trainable params: 0
__________________________________________________________________________________________________
2022-12-19 16:49:09.825172: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
4/4 [==============================] - 0s 1ms/step
Epoch 1/10
4/4 [==============================] - 2s 7ms/step - loss: 0.9583
Epoch 2/10
4/4 [==============================] - 0s 7ms/step - loss: 0.7269
Epoch 3/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7240
Epoch 4/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7233
Epoch 5/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7251
Epoch 6/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7184
Epoch 7/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7186
Epoch 8/10
4/4 [==============================] - 0s 4ms/step - loss: 0.7176
Epoch 9/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7190
Epoch 10/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7222
Epoch 1/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 2/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 3/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 4/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 5/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 6/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 7/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 8/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 9/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 10/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00

philipperemy commented 1 year ago

Sorry again, I thought you had a compilation problem. I misread your question. What I can tell you is that:

500 features looks gigantic for num_features and 128 (the number of units) seems too low.
the number of data points you have must be too small.
try removing the ConditionalRecurrent wrapper and see if the model can learn a bit better (I guess it should be exactly the same but who knows!)
By the way, a softmax with one output is a sigmoid. So you should use a binary_crossentropy loss with sigmoid activation instead of softmax. Don't use a mae loss it's a bad idea.
You have only 2 batches? I can see that in the batch_size=. You can't learn if you have only 2 batches. Make smaller ones. Or try to find another way to look at your problem.
epochs=1 same. It's too small.

Ref: https://stats.stackexchange.com/questions/207049/neural-network-for-binary-classification-use-1-or-2-output-neurons

Simo-JDev commented 1 year ago

Thanks, for your reply. Unfortunately the number of features corresponds to number of points on the horizontal axis of a frequency spectrum, so lowering it would mean under sampling the frequency, which so far hasn't worked very well. Do you suggest increasing the number of units to compensate? I guess having the Conditional Recurrent could compensate for lower definition frequency sampling...

What do you mean with 'the number of data points you have must be too small'?

Regarding the last 2 points in your last reply, I agree, I normally train with smaller batch sizes and for at least 25 epochs. I must have copied the code from the script to test the plot fo the model only.

Simo-JDev commented 1 year ago

Also, I really appreciate all the help, but from the last example you posted I feel like I haven't explained myself really well. My aim is to feed the neural network the last 10 steps of a spectral evolution, so every step is a full spectrum like in the figure below:

Screenshot 2022-12-19 at 15 29 26

From the last 10 steps, the model should predict the next one. So from 10 500 points arrays it should predict the 11th 500 point array. I have a model fully working but I was hoping to include your ConditionalRecurrent in order to handle more degrees of freedom in the input parameters. Hope his helps understand the issue I have.

Simo-JDev commented 1 year ago

By the way this is currently the best working model I have. It performs quite well, I am only trying to improve it.

n_lookback = 10
n_features = 250
n_conds = 1
i = Input(shape=[n_lookback,n_features])
c = Input(shape=[n_conds])

comb = ConditionalRecurrent(LSTM(128, name='LSTM'), name='combined')([i,c])
comb = Dense(n_features,)(comb)
out = Dense(n_features)(comb)

model = Model(inputs=[i,c], outputs=[out])
model.compile(optimizer='adam',loss='mae')

I have managed to reduce the features to 250 without compromising the quality of the predictions.

I have also tested the binary_crossentropy in this model and it is significantly worse, so I don't understand why you say mae is not a good idea.

philipperemy commented 1 year ago

Okay I understand better. When I saw softmax I thought that it was a classification problem hence the binary_crossentropy. So mae in your case makes sense. I think you are doing it the right way. It's just that the problem is not easy to learn. Try to gather as much data as possible and increase your model size bit by bit.

philipperemy / cond_rnn

Simple LSTM #38