keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.65k stars 19.42k forks source link

LSTM/RNN input and output #2892

Closed hrsmanian closed 3 years ago

hrsmanian commented 8 years ago

Hi, This is a general question regarding using LSTM/RNN. What does the input and output look like? Are there any concrete examples? Kindly let me know

Say I have 5 dimension features per frame and want to use them for binary classification. Normally, for a feed forward network, the data and label would be as below

[1,2,3,4,5] [0] [2,3,4,5,6] [0] [3,4,5,6,7] [0] [4,5,6,7,8] [1] -> different class [5,6,7,8,9] [1] -> different class

If I want to use RNN/LSTM, then how would my output and inputs be? For a timeStep = 2, batch_size = 3, Will I have one label for each time step?

Any pointers will be appreciated

Thanks

mbollmann commented 8 years ago

For a feed-forward network, your input has the shape (number of samples, number of features). With an LSTM/RNN, you add a time dimension, and your input shape becomes (number of samples, number of timesteps, number of features). This is in the documentation.

So if your feature dimension is 5, and you have 2 timesteps, your input could look like

[ [[1,2,3,4,5], [2,3,4,5,6]], [[2,4,6,8,0], [9,8,7,6,5]] ]

Your output shape depends on how you configure the net. If your LSTM/RNN has return_sequences=False, you'll have one label per sequence; if you set return_sequences=True, you'll have one label per timestep.

hrsmanian commented 8 years ago

Thanks for the reply. Really appreciate it

So in the example, [ [[1,2,3,4,5], [2,3,4,5,6]], [[2,4,6,8,0], [9,8,7,6,5]] ]

input_shape is (2, 2, 5). Is that correct?

And a 'sequence' is '[[1,2,3,4,5], [2,3,4,5,6]]' I assume. Is that correct? and has 2 timesteps

mbollmann commented 8 years ago

That should be correct, yes.

anirudhgupta22 commented 7 years ago

Hi,

I am trying to work with CNN+LSTM and facing problem in using LSTM after CNN. From the last ConvLayer of the network i am getting the shape 32x8x26, how can i use LSTM after this?

Thanks.

Kevinpsk commented 7 years ago

@anirudhgupta22
Hi, firstly I have to say that I am also new to CNN+LSTM, but my feeling is that you probably need to reshape your data after CNN using the Flatten layer. In your case, if you have an output from CNN with shape, 32x8x26 at each time step, then you use Flatten layer to make it 32x8x26=6656, so you will then have a 6656 dimensional feature at each time step. But how to then connect it to the LSTM, I also not sure either, and also currently studying it. Hopes this helps.

Cheers

wengchen1993 commented 7 years ago

Hi all,

A similar question: given that I have a time series dataset, I want to use an (14 x 1) input data vector consists of data from 14 days ago until now i.e [t-13, t-12, t-11 ... t-1, t] to predict an (3 x 1)output data vector 10,11 and 12 days later i.e [t+10, t+11, t+12].

The final input data will therefore have shape of (34536, 14, 1). The final output data will therefore have shape of (34536, 3, 1).

Now I want to build a very simple model : LSTM -> TimeDistributed. Is it correct if I built my model as followed:

xshape = (34536, 14, 1)
model = Sequential()
model.add(LSTM(units = 3, input_shape=(xshape[1], 1), return_sequences = True ))
model.add(TimeDistributed(Dense(3)))

Thanks in advance for any advice!

mbollmann commented 7 years ago

@wengchen1993 You can use from keras.utils.layer_utils import print_summary; print_summary(model) to check the shapes of your layers. From there, you can see that the output shape is (None, 14, 3), so your code does not do what you want.

First of all, the number 3 in your code, both in LSTM(units=3) and in Dense(3), has nothing to do with the number of timesteps, but specifies the dimensionality of the output space. LSTM(3) would therefore produce an LSTM that outputs 3-dimensional vectors (which is tiny for an LSTM, really). Similarly, Dense(3) also outputs 3-dimensional vectors, which would be useful if you wanted to predict one of three classes, for example.

What you could do - but note, I have no idea if that is actually a good choice for your particular problem domain - is have an LSTM that outputs an n-dimensional vector, then split up that vector into three parts (the three timesteps you want in your output). For example:

n = 300
model = Sequential()
model.add(LSTM(n, input_shape=(xshape[1], 1)))
model.add(Reshape((3, -1)))
model.add(TimeDistributed(Dense(1)))
wengchen1993 commented 7 years ago

Hi thanks @mbollmann! The format you suggested seems to be correct as I observe from the model summary:


Layer (type) Output Shape Param #

lstm_1 (LSTM) (None, 14, 3) 60


reshape_1 (Reshape) (None, 3, 14) 0


time_distributed_1 (TimeDist (None, 3, 1) 15

Total params: 75 Trainable params: 75 Non-trainable params: 0


However, I notice the outputs I get are all the same :

[[[ 19.89518166] [ 17.22050476] [ 20.95066833]]

[[ 19.89518166] [ 17.22050476] [ 20.95066833]]

[[ 19.89518166] [ 17.22050476] [ 20.95066833]]

..., [[ 19.89518166] [ 17.22050476] [ 20.95066833]]

[[ 19.89518166] [ 17.22050476] [ 20.95066833]]

[[ 19.89518166] [ 17.22050476] [ 20.95066833]]] (8628, 3, 1)

Are there any reasons you can think of why this happens? I have inspected the inputs and they are quite different.

mbollmann commented 7 years ago

From the model summary, I guess you're still using return_sequences=True and units=3 with your LSTM. You should have return_sequences=False and a much higher number for the units to start with.

Apart from that, it's really hard to say anything else without knowing what you're trying to do, how you're trying to train the model, etc.

wengchen1993 commented 7 years ago

@mbollmann Sorry if I have not been able to make it clear, but let me try to explain again:

What I have done (Many-to-one) I have a single time series data, for example [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] and I split it into a (mx1) input vector to predict a scalar value, t timesteps ahead. If m is 5 and t is 4, I will have [1,2,3,4,5] -> 9, [2,3,4,5,6] -> 10, [3,4,5,6,7] -> 11 ...... [6,7,8,9,10] -> 14 and [7,8,9,10,11] -> 15. This is done by using LSTMs and a Dense(1) at the end because this is a regression problem.

What I want now (Many-to-many) I want to do basically the same thing but instead of predicting a scalar value, I predict a n vector. If m is 5, t is 4 and n is 3, I will have: [1,2,3,4,5] -> [9,10,11], [2,3,4,5,6] -> [10,11,12], [3,4,5,6,7] -> [11,12,13], [4,5,6,7,8] -> [12,13,14], [5,6,7,8,9] -> [13,14,15].

The parameters used above are just for example and I used much larger values in the actual code. Is TimeDistributed layer suitable to be used in a regression problem as above?

mbollmann commented 7 years ago

I've never worked with predicting scalars, and I don't know if my sample code is the best model for this problem, but it should roughly be able to do what you want.

The LSTM would encode all of the input timesteps into a single vector (so return_sequences=False) which is then split up into three parts, to produce the three separate output timesteps you want. (Alternatively, you could probably skip the Reshape and TimeDistributed parts and just predict a 3-dimensional vector via Dense(3), where each dimension is one of the values you want to predict... that might also work. With neural networks, there is rarely a single correct answer. :))

naisanza commented 7 years ago

@mbollmann

Going off the example you gave: [ [[1,2,3,4,5], [2,3,4,5,6]], [[2,4,6,8,0], [9,8,7,6,5]] ]

If input_shape=(2, 2, 5) then,

Would the keras line be: model.add(LSTM(hidden_units, input_shape=(2, 2, 5)))?

And if that's true, how come in this example tutorial it's using: model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))), translated to: model.add(LSTM(256, input_shape=(sequence_length, features))), translated to: model.add(LSTM(256, input_shape=(100, 1)))

Instead of: model.add(LSTM(256, input_shape=(147574, 100, 1)))

Which results to: ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

mbollmann commented 7 years ago

@naisanza

The input shape is (2, 2, 5), yes, but the batch dimension (= number of samples) is never specified when giving an input_shape argument to Keras. See this part of the documentation.

So the Keras line would have input_shape=(2, 5).

naisanza commented 7 years ago

@mbollmann oh I didn't know batch dimension meant samples, I thought it was referring to batch_size

So many words

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

mazharaliabro commented 6 years ago

Hello friends I could not the term 'None' in output shape like

Layer (type) Output Shape Param #
embedding_2 (Embedding) (None, 28, 128) 256000
spatial_dropout1d_2 (Spatial) (None, 28, 128) 0

Please let me know that why None is given in output layer waiting for your response

xyzxinyizhang commented 6 years ago

Hi,

I have a question for the RNN output dimension.

I use (I simply the model here) model.add(SimpleRNN(2,input_shape=(None, 1),return_sequences=True,name="1")) model.add(SimpleRNN(4,return_sequences=False, name="2"))

Each of my input sequence length is 9 so I got the layer"1" output shape as (2297, 9, 2), 2297 is my total samples, 2 is my output dimension. The model works well. But I'm wondering why the RNN output dimension is related to hidden units? Refer to S_t=(UX_t+WS_t), O_t=Softmax(V*St)

For layer 1: U(hiddeninput_dim) is 21 matrix, X_t is 11 matrix, W(hiddenhidden) is 22 matrix, V(input_dimhidden)=12. Then the O_t should be a 11 here. Why my output dimension is 2?

Below is my input example: [[4.00298875] [4.00298875] [4.00298875] [4.00298875] [4.00298875] [4.00298875] [4.00298875] [4.00298875] [4.00298875]]

Can anyone help this?

surajp92 commented 5 years ago

If I have only one sample, do I need to reshape it into three dimensions? For example, if I have the input data of shape [m,n] and output also has [m,n] shape. Do I need to change the input into [m,1,n] shape? Also, can I keep the output shape [m,n] if I am using return_sequences=False? I am dealing with time series data. Thank you. Below are the input and output at different time steps for example Input: Output: [1,2] [3,4] [2,3] [4,5] [4,5] [6,7]

Chrisinger commented 4 years ago

@mazharaliabro 'None' mostly refers to the batch size. In the standard way of defining the input size of the model, the first dimension is the batch size, but it is not yet fixed at the time the model is created. This is why it is unknown and thus is 'None'. If you want it fixed, then you need to use the parameter batch_input_shape(batch_size, time_steps, no_features) instead of input_shape(time_steps, no_features).

lmvanpoppel commented 4 years ago

Hello, a quick question, please could someone help, it would be greatly appreciated! I'm struggling with understanding the LSTM input and I hope someone here could help me and confirm if what I am doing is right or wrong. To be clear, I do not get any errors, I just want to know if my reasoning is correct :) I am using Keras, the code is below.

As the input to an LSTM should be (batch_size, time_steps, no_features), I thought the input_shape would just be input_shape=(30, 15), corresponding to my number of timesteps per patient and features per timesteps. When I use model.fit, I use my X (200,30,15) and Y (200,). My question here is, am I doing this right? Is my model actually connecting per patient the [30 timesteps x 15 features] to the the binary classification of Y? I also do not really get if I need to specify a batch size at model.fit?

CODE

Build model

model = Sequential() model.add(LSTM(128, input_shape=(30,15), activation='relu', return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(128, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(1)) model.add(Activation('sigmoid'))

Compile model

opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6) model.compile( loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

Fit model

history = model.fit(x, y, epochs=20, validation_split=0.1)

Chrisinger commented 4 years ago

@Imvanpoppel It seems like you're doing this right. As the shapes do not throw an error, this is what the model processes. If you are not sure, you can use the model.summary function to check if the network structure is actually the way you want it. https://keras.io/api/models/model/#summary-method

You do not need to specify a batch_size. The API states: "If unspecified, batch_size will default to 32." https://keras.io/api/models/model_training_apis/

I'm not sure if you want the dropout within the LSTM layer or outside of it. In my understanding, you would rater like to have it inside. Especially since the training of LSTM units is very long, this outer dropout might slow down the training of the second layer.

Binary classification can be done that way. An alternative would use a Dense layer with softmax activation as last layer and a recoding of your classes to use a one-hot encoding instead of the one you use now.

I'm not sure about the validation_split, I've always done the splitting manually. To see if it worked as expected, you can plot the training history. I guess that you are - with only 20 epochs - underfitting the model.

lmvanpoppel commented 4 years ago

@Chrisinger

Thanks you so much for the fast reply! Would you be willing to give me some of your knowledge on the following (I'm sorry its a lot, I really hope you can help me, thanks so much in advance). Again, it would be greatly appreciated, I need to do this for my thesis but I do not know anyone who could help me (it outside the scope of my studies).

I chose the structure a bit random, I don't have a really good idea of what that should be yet and after reading/watching many tutorials, it's all still a bit difficult for me.

I'm not sure if I will need 1 or 2 layers, and how many units a layer should have. I get that I should try different architectures and check the performance but do you have any idea what should be a good range or starting point is for the number of units? I read multiple "rules" that should give me an idea: E.g. the neurons should be below this value: 𝑁ℎ=𝑁𝑠/(𝛼∗(𝑁𝑖+𝑁𝑜)) 𝑁𝑖 = number of input neurons 𝑁𝑜 = number of output neurons 𝑁𝑠= number of samples in training data set 𝛼 = an arbitrary scaling factor usually 2-10 But for me this would result in this right? 200/(alpha*(30+2), which would give a really low amount.

Other rules of thumb on that forum are as follows:

As for the dropout, could you explain what you mean with inside or outside (is inside like this: model.add(LSTM(128, dropout=0.5) ?) Also I'm not exactly sure what you suggest, should I move the dropouts inside, or do I need to remove the 2nd dropout? And if I end up using only 1 layers LSTM, should it then include a dropout? (Sorry, I just read it was useful for overfitting but I lack the knowledge to judge whether I need it and where)

As for the classification, I tried to use softmax, but I'm not sure how. I thought that it would be like this, with 2 neurons where the output is the probability for each of the classes: model.add(Dense(2,activation='softmax'))

But then I get this error: ValueError: A target array with shape (200, 1) was passed for an output of shape (None, 2) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output. If I understand correctly, I'm asking for 2 outputs (the probabilities) but I only provide 1 output (1 or 0) per patient, which is why this doesn't work? You say I should use "recoding of your classes to use a one-hot encoding" but I lack the knowledge to understand that.

As for the number of epochs, if I use the model architecture I previously send, and use 20 and 100 epochs, I get this: 20: image image

100: image image

I'm sorry, again a question, I do get an error even though the model runs, but I read online on some forum that I could ignore it but maybe you know if its important:

WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables..initialize_variables at 0x7fb92ce738c0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: WARNING: Entity <function Function._initialize_uninitialized_variables..initialize_variables at 0x7fb92ce738c0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.

Chrisinger commented 4 years ago

@imvanpoppel The bad new about the structure of neural networks: Noone knows in advance which network structure should be used. This depends on the nature of your data. So in your case: What are the 15 features that you create for your EEG recordings? It maybe that a different network structure may be better suited - you are the only one who can find it out. My first recommendation is to start with very simple models, they train very fast and you can get used to how the training works and check the results. Have you tried other simpler machine learning models? If not, I recommend decision trees for a start. They are much faster than neural networks and easy to interpret, and you do not need to decide numbers of neurons, activation functions, optimizers, etc. Once they work, you can try to improve your performance by using more complex models, like random forests, support vector machines or neural networks. Again, when starting with neural networks, start with very small models like with 5 or 10 neurons just to get an intuition on the results. You can increase these parameters and play with them afterwards.

Second, I strongly recommend to use a cross-validation: In your figures, it seems to me that the outcomes differ significantly for 20 and 100 epochs. I guess this is caused by the fact that keras has used different splits. And with the current source code you cannot influence this split. Normally, one is interested in an accuracy, that serves as an estimate for new, unknown data. Therefore, you make several runs, with different random splits and look at the average outcome afterwards. This should be a better indicator that just one or two random runs.

Dropout: The computations within an LSTM are not trivial, they are explained nicely in colah's blog https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Again, I would start simple, without dropout, look at diagrams and the overall result. Then try a version with one dropout layer, see how it performs and so on. Yes, in the last message I meant to use dropout within the LSTM as you wrote it: LSTM(128, dropout=0.5). You can see in the diagrams if you should do something against overfitting: Normally at the beginning of the training, the loss on the training data and on the test data is reduced, as the model is being trained. But at some point the test loss starts to increase again. This is the point where your model does not generalize properly - so although it keeps improving on the training data, it is getting worse on the test data. And this is what you want to prevent. This is why I recommend first to make some experiments, and add dropout later.

There are two ways to make models for a 2-class classification. One is what you applied, and the other one uses: model.add(Dense(2,activation='softmax')) But this then needs 2 output neurons and class 0 is not represented by a single number "0" anymore, but by a binary vector [1, 0]. The second class "1", would then be [0,1]. This is why there was an error, when gave the numbers, when keras expected binary vectors.

Aboput your warning in tensorflow: I have no idea, sorry. However, I hope that I could help. There may be some courses on coursera.com on basics of machine learning that might help you. Good luck!