Closed Simo-JDev closed 1 year ago
Hello,
The code I've pasted below works and learns nicely.
The training task is straightforward. I'm feeding the conditions to the input X and to the target Y.
It's a good way to see if the condition is used in the model.
I also give a random Y (an impossible task to learn) and it's clear that the model is not learning when Y is purely random.
I guess this example should help you.
PS: look at my message below. It's closer to what you want to achieve.
import numpy as np
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import LSTM, Bidirectional, Dense
from cond_rnn import ConditionalRecurrent
forward_layer = ConditionalRecurrent(LSTM(units=12, return_sequences=False))
backward_layer = ConditionalRecurrent(LSTM(units=13, return_sequences=False, go_backwards=True))
NUM_SAMPLES = 100
TIME_STEPS = 10
INPUT_DIM = 3
NUM_CLASSES = 2
inputs = (
Input(shape=(TIME_STEPS, INPUT_DIM)),
Input(shape=(NUM_CLASSES,)) # conditions.
)
x = Bidirectional(
layer=forward_layer,
backward_layer=backward_layer,
)(inputs=inputs)
output = Dense(NUM_CLASSES)(x)
model = Model(inputs=inputs, outputs=output)
model.compile(loss='categorical_crossentropy')
model.summary()
train_inputs = np.random.uniform(size=(NUM_SAMPLES, TIME_STEPS, INPUT_DIM))
train_targets = np.zeros(shape=[NUM_SAMPLES, NUM_CLASSES])
assert model.predict(x=[train_inputs, train_targets]).shape == (NUM_SAMPLES, NUM_CLASSES)
x = [train_inputs, train_targets]
# will not learn.
model.fit(x=x, y=np.random.uniform(size=train_targets.shape), epochs=10)
# will learn.
model.fit(x=x, y=train_targets, epochs=10)
Sorry the previous snippet was with Bidirectional
. Use this one. It's closer to what you want to achieve.
import numpy as np
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import LSTM, Dense
from cond_rnn import ConditionalRecurrent
NUM_SAMPLES = 100
TIME_STEPS = 10
INPUT_DIM = 3
NUM_CLASSES = 2
inputs = (
Input(shape=(TIME_STEPS, INPUT_DIM)),
Input(shape=(NUM_CLASSES,)) # conditions.
)
x = ConditionalRecurrent(LSTM(units=12, return_sequences=True))(inputs)
x = ConditionalRecurrent(LSTM(units=13, return_sequences=False))([x, inputs[1]])
output = Dense(NUM_CLASSES)(x)
model = Model(inputs=inputs, outputs=output)
model.compile(loss='categorical_crossentropy')
model.summary()
train_inputs = np.random.uniform(size=(NUM_SAMPLES, TIME_STEPS, INPUT_DIM))
train_targets = np.zeros(shape=[NUM_SAMPLES, NUM_CLASSES])
assert model.predict(x=[train_inputs, train_targets]).shape == (NUM_SAMPLES, NUM_CLASSES)
x = [train_inputs, train_targets]
# will not learn.
model.fit(x=x, y=np.random.uniform(size=train_targets.shape), epochs=10)
# will learn.
model.fit(x=x, y=train_targets, epochs=10)
Output
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 10, 3)] 0 []
input_2 (InputLayer) [(None, 2)] 0 []
conditional_recurrent (Conditi (None, 10, 12) 804 ['input_1[0][0]',
onalRecurrent) 'input_2[0][0]']
conditional_recurrent_1 (Condi (None, 13) 1391 ['conditional_recurrent[0][0]',
tionalRecurrent) 'input_2[0][0]']
dense (Dense) (None, 2) 28 ['conditional_recurrent_1[0][0]']
==================================================================================================
Total params: 2,223
Trainable params: 2,223
Non-trainable params: 0
__________________________________________________________________________________________________
2022-12-19 16:49:09.825172: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
4/4 [==============================] - 0s 1ms/step
Epoch 1/10
4/4 [==============================] - 2s 7ms/step - loss: 0.9583
Epoch 2/10
4/4 [==============================] - 0s 7ms/step - loss: 0.7269
Epoch 3/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7240
Epoch 4/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7233
Epoch 5/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7251
Epoch 6/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7184
Epoch 7/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7186
Epoch 8/10
4/4 [==============================] - 0s 4ms/step - loss: 0.7176
Epoch 9/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7190
Epoch 10/10
4/4 [==============================] - 0s 3ms/step - loss: 0.7222
Epoch 1/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 2/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 3/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 4/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 5/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 6/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 7/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 8/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 9/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Epoch 10/10
4/4 [==============================] - 0s 3ms/step - loss: 0.0000e+00
Sorry again, I thought you had a compilation problem. I misread your question. What I can tell you is that:
ConditionalRecurrent
wrapper and see if the model can learn a bit better (I guess it should be exactly the same but who knows!)binary_crossentropy
loss with sigmoid activation instead of softmax. Don't use a mae
loss it's a bad idea.batch_size=
. You can't learn if you have only 2 batches. Make smaller ones. Or try to find another way to look at your problem.epochs=1
same. It's too small.Thanks, for your reply. Unfortunately the number of features corresponds to number of points on the horizontal axis of a frequency spectrum, so lowering it would mean under sampling the frequency, which so far hasn't worked very well. Do you suggest increasing the number of units to compensate? I guess having the Conditional Recurrent could compensate for lower definition frequency sampling...
What do you mean with 'the number of data points you have must be too small'?
Regarding the last 2 points in your last reply, I agree, I normally train with smaller batch sizes and for at least 25 epochs. I must have copied the code from the script to test the plot fo the model only.
Also, I really appreciate all the help, but from the last example you posted I feel like I haven't explained myself really well. My aim is to feed the neural network the last 10 steps of a spectral evolution, so every step is a full spectrum like in the figure below:
From the last 10 steps, the model should predict the next one. So from 10 500 points arrays it should predict the 11th 500 point array. I have a model fully working but I was hoping to include your ConditionalRecurrent in order to handle more degrees of freedom in the input parameters. Hope his helps understand the issue I have.
By the way this is currently the best working model I have. It performs quite well, I am only trying to improve it.
n_lookback = 10
n_features = 250
n_conds = 1
i = Input(shape=[n_lookback,n_features])
c = Input(shape=[n_conds])
comb = ConditionalRecurrent(LSTM(128, name='LSTM'), name='combined')([i,c])
comb = Dense(n_features,)(comb)
out = Dense(n_features)(comb)
model = Model(inputs=[i,c], outputs=[out])
model.compile(optimizer='adam',loss='mae')
I have managed to reduce the features to 250 without compromising the quality of the predictions.
I have also tested the binary_crossentropy
in this model and it is significantly worse, so I don't understand why you say mae
is not a good idea.
Okay I understand better. When I saw softmax
I thought that it was a classification problem hence the binary_crossentropy
. So mae
in your case makes sense. I think you are doing it the right way. It's just that the problem is not easy to learn. Try to gather as much data as possible and increase your model size bit by bit.
Hello, since you have been so helpful so far I thought I'd try again 😄
I have this simple LSTM, where the inputs are the time series, of shape (None, 10, 500), i.e. 10 time-steps and 500 features, and the condition is a single value per input so (None, 1). My first attempt was designing the network without specifying the input shape and plugging in the data at training:
Obviously I get the error:
but the results are surprisingly good, better than before introducing the cond_rnn.
Trying to solve the above issue, I have tried following another one of your examples and ended up with this:
In this case I get no error regarding the inputs. but the results are completely useless, outputting always zero for every feature.
Any idea on what is wrong in my code? I have followed your examples as much as possible while trying to apply it to my case, so maybe I have messed up something in the process.
EDIT: Solved part of the issue myself, so I removed that part of the question