keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.47k forks source link

LSTM for System Identification task #8579

Closed cleKarl closed 3 years ago

cleKarl commented 6 years ago

Hello together,

For a final thesis I am working with Keras on Tensorflow and have to investigate how LSTM Networks can be used to identify dynamical systems. I have given one dimensional Input Data X(t) and the depending Output Data Y(t).

First of all, I preprocessed the data in terms of normalizing in a range form (-1,1). From this I made time depending samples, using 10 values of the Input data to get the time dependency in my network. So da Data looks like: x(t) = (0......86138) (Datapoints)
u(0)=(x(t), x(t-1), .... x(t-10)) u(1)=(x(t+1), x(t), x(t-1) .... x(t-10)) .... u(86129)=........ It works like shifting a Window with a length of 10 over the data x(t) and take a snap every step. My new input data for the NN will then be: input (86129,10,1) The output data gets formated to: target (86129,1) --> so basically for each output value, there is a input vector of size 10.

My question about this task is:

As I am relatively new to machine learning and keras, am I doing right so far with preprocessing and do you think that LSTM Networks can leed to better results than general FeedForward ones?

.... model = Sequential() model.add(LSTM(units=100, activation='tanh', input_shape=(10,1))) model.add(LSTM(units=100, activation='tanh') model.add(Dense(units=1, activation='linear')

model.compile(loss='mean_squared_error', optimizer='Adam') model.fit(input, target, epchs=100, batch_size=1000, verbose=1) ....

The results are quite bad, even worse than using a Dense layers instead of LSTM layers. Is there any thing you would reccomend to change here? Someone recommended to use Sequence2Sequence model structure, but I only can find examples for translation tasks and so on.

I am really greatfull for any help you can give me.

Thanks in advance.

bo5o commented 6 years ago

Hi,

Have you had any success yet? I am also investigating LSTM networks for system identification tasks and indeed I am using sequence to sequence models. What kind of input data are you using? Have you tried it on simple dynamical models first?

shrikanth95 commented 6 years ago

Hi, I need to develop LSTM networks for SI for my project as well. I would like to know if you have published your results anywhere. If not, whats the status?

bo5o commented 6 years ago

I am still working on the project. The results are not yet published in any way. If you want I can link you to some resources that I found especially helpful coming from a system identification background.

shrikanth95 commented 6 years ago

Thanks. I would like that. i'm trying the same with Tensorflow using the seq2seq approach--i'm still tinkering with TF as this is my first encounter.

Since you had asked about input data above, what did you finally use?-- I ask in case there is something i missed. The paper by Wang uses a sum of sinusoids which seems like a standard procedure since the paper by Narendra and Parthasarathy in 1990.

cleKarl commented 6 years ago

Sry for late answering. I am still working on that Problem. I haven't had any success so far. I also found the paper, you are talking about, but its mostly about a convex based LSTM structure. What i read so far about sequence2sequence models sounds promising, but since I am pretty new to this topic I don't have much expertise. I am using a dataset with about 85000 datapoints. Also tried it with data of a simple dynamic system, without success. What I am struggeling with, is how to use the internal dynamics of recurrent neural networks (such as LSTM) for the system identification task. Only feed forward neural networks with external dynamics (such as time delay networks) led to quite good results.

But maybe the sequence2sequence idea is the key. Hit me up if it works with your projects.

shrikanth95 commented 6 years ago

Is your project academic? I am pretty much stuck in the same problem, I dont understand how Tensorflow and the LSTM community defines batches and time instances. This post looks interesting however. What are your thoughts on the same?

bo5o commented 6 years ago

The explanation in the post you've linked seems right and confirms what I have experienced so far. Nevertheless, batches and stateful recurrent nets in Keras are still a source of confusion for me sometimes.

I found it very helpful to first simulate simple dynamic systems that could easily be represented by a state-space model for example. Then to build a neural net and play around with the parameters and look at the output. You then get a feel for what is going on in the RNN/LSTM and it might help you to come up with more complicated models. In my experience it helps to spend some time figuring out good values for weight initialization and how some constraints could limit the kind of local minima the optimizaton finds.

cleKarl commented 6 years ago

I feel the same. Keras and Tensorflow is confusing me a lot. Especially in how to set up the tensors for training. I´m doing this kind of research for a academic project. What i did so far, I only substituted the dense Layers with LSTM-Layers. This doesn´t improove anything exept that te training process lasts longer. And I still need to present the data in form of time vectors to the network (Like Time-Delay-Networks). I think I somehow need to get used to sequence2sequence models, but have not found some code which I can use for that so far.

shrikanth95 commented 6 years ago

Was doing some searching today. Its already been addressed in the form of time series prediction in this link. The implementation can be optimized though.

AleksandarHaber commented 5 years ago

Hello,

Interesting questions and answers. Have you made some progress? As a side note, system identification is much more complex than simple time series prediction. It would be really interesting to see what are the advantages of using LSTM and other types of deep NN over the classical system identification techniques (such as prediction error methods and subspace identification methods).

Best Regards, Aleksandar Haber

AleksandarHaber commented 5 years ago

Hello again,

Motivated by your questions and due to my interest in this topic, I started a project on the identification of dynamical systems using various recurrent neural network architectures and the Keras toolbox. The project description can be found here, and the codes can be found here

bo5o commented 5 years ago

Very cool project. I did the exact same thing (only with a different dynamical system) when I started working on my project. However, why do you use non-linear activations in this example? This should work with a 2-unit SimpleRNN without non-linearities.

Also, I recently discovered eigensteve.com and his youtube channel. He is a professor at University of Washington and does a lot of research on data driven dynamics and control using machine learning. I found this to be a very interesting resource.

AleksandarHaber commented 5 years ago

Thank you for the feedback. For brevity, I tried to use the default Keras setting when adding the layers:

model.add(LSTM(32, input_shape=(trainX.shape[1],trainX.shape[2]),return_sequences=True))

I guess that linear activation functions will also work since the model itself is linear. Now, for the specific problem formulation I used, I am not sure that a 2-unit SimpleRNN will give good prediction results (even if linear activation functions are used). Namely, the dynamics here is seen from the input in the form U=(x0, u{1},u{2}, \ldots, u{N}) to the output of the form Y=(y{0}, y{1},\ldots, y{N}). The initial condition needs to be added to inputs since the network needs to know from where to start (to predict the future state of a dynamical system, we need to know an initial state and an input sequence).

This is probably not the "best" problem formulation. I think that it is better to start from an ARX model, which does not need initial conditions, and in the ARX case, the input matrix U contains past system outputs and inputs.

It would be interesting to see your code and the approach you used.

I am now experimenting with different problem formulations and with 1D convolutional neural networks (CNNs) in Keras. I think that CNNs might be more suitable for this application simply because it is very well known that the output of a (linear) system is a convolution of its inputs:

"The fundamental result in LTI system theory is that any LTI system can be characterized entirely by a single function called the system's impulse response. The output of the system is simply the convolution of the input to the system with the system's impulse response." (https://en.wikipedia.org/wiki/Linear_time-invariant_system)

Finally, I am familiar with the work of the above-mentioned professor. They approach the problems from the dynamical system's perspective, which is an interesting approach.

P.S. The identification problem can be easily solved using the subspace identification methods, that can give the system matrices A,B,C,D up to unknown similarity transformation. However, this method works well only for linear systems and for certain classes of nonlinear systems.

bo5o commented 5 years ago

Hi,

I cloned your repo and tried with a 2-unit SimpleRNN. I used the following model:

model = Sequential()
model.add(
    SimpleRNN(
        2,
        input_shape=(trainX.shape[1], trainX.shape[2]),
        use_bias=False,
        activation="linear",
        return_sequences=True,
    )
)
model.add(Dense(1, activation="linear", use_bias=False))

This model contains more than enough parameters to learn the dynamics. The results are very good.

responselstm32

Loss during training:

losslstm32

AleksandarHaber commented 5 years ago

Great insight! So it is really necessary to explicitly say to Keras that we want linear activation functions and no bias... Thank you.

Update: Does this mean that nonlinear activation functions actually add an artificial model order?

AleksandarHaber commented 5 years ago

I was experimenting with other models of linear systems and other network architectures. It turns out that a simple Multilayer Perceptron (MLP) network, with 2 nodes and a single output node, can do the job quite well (in the noiseless scenario). I used an ARX model to predict the system output. I wrote a post explaining how to go from a state-space model to an ARX model, and how to train a network to learn the ARX model.

If someone is interested, the post can be found here and the codes can be found here.

The prediction performance for recurrent and MLP networks will probably significantly deteriorate when the observations are corrupted by the measurement noise. I am not sure are there any strong statistical (consistency) guarantees when neural networks are used for model estimation...

bo5o commented 5 years ago

Update: Does this mean that nonlinear activation functions actually add an artificial model order?

It depends on the activation function I think. Imagine a sigmoidal for example. Since the underlying system is linear, the non-linear network will have to learn to reproduce that linear behaviour. A single sigmoidal unit for instance would have to learn to stay in its linear regime (around zero), thus pushing the bias close to zero. Any inputs further away from the linear regime however will result in erroneous outputs. You would then have to add many other non-linear units to get better results.

Thats why I chose "linear" activation and no bias. But of course, this is based on some a priori knowledge of the system and can not always be assumed.

AleksandarHaber commented 4 years ago

For those of you who are still following this post: I have implemented a subspace identification method for estimating models of dynamical systems. The codes can be found here:

https://github.com/AleksandarHaber/Subspace-Identification-State-Space-System-Identification-of-Dynamical-Systems-and-Time-Series-

and a post with detailed explanations can be found here:

https://aleksandarhaber.github.io/machine_learning/2019/11/13/subspace-identification.html

In my opinion, in the case of linear systems, subspace identification algorithms outperform most of the methods (including neural network methods) for the estimation of dynamical systems. Unfortunately, subspace identification methods are not widely known to a broad scientific community. In the future, I will compare subspace identification methods with other neural-network-based estimation methods.

Best Regards, Aleksandar Haber

bballamudi commented 4 years ago

@AleksandarHaber I read through the post linked above. Thanks for sharing. I wonder if the subspace identification can also solve nonlinear system identification. Say I want to model half car dynamic system which is characteristic as a nonlinear problem.

AleksandarHaber commented 4 years ago

Hi bballamudi,

Subspace identification of nonlinear systems is an active research area. To the best of my knowledge, stable and experimentally verified methods have been developed for the following classes of nonlinear systems: 1) Wiener state-space models 2) Hammerstein state-space models 3) Wiener+Hammerstein state-space models 4) Bilinear state-space models 5) Linear parameter varying systems

The subspace identification methods work surprisingly well and are experimentally verified on many different system types. Also, these methods have great potential for the estimation of time series, such as the stock market...

To start with, you can look at the following (not so recent) paper,

http://people.duke.edu/~hpgavin/SystemID/References/Palanthandalam-Madapusi-ACC-2005.pdf

and the works of Prof. M. Verhaegen from TU Delft, who is one of the founders of the subspace identification method.

Best, Alex

TheRed86 commented 4 years ago

Hi, I also have problems to get my network working. I really don't understand how to use samples (different input sequences of n time-steps each) in keras to train my model. Actually, I get an error if I train using 2 samples and I validate data on 1 input sequence only. I started approaching this topic using @AleksandarHaber code (https://github.com/AleksandarHaber/Machine-Learning-of-Dynamical-Systems-using-Recurrent-Neural-Networks). Then I slightly changed it to my purpose. My system is a non-linear mass-spring system (I added a x^3 term in the equation) and it is forced using a time sequence given by F(t)=F0sin(omegat). Where F0 is a constant value. I use finite differences to simulate data. This part is ok. Results are correct. Then: I want to train the NN using 2 input sequences (2 F(t) sequences obtained using 2 different values of omega ). Then I would like to validate on a input sequence generated with a different omega value (only 1 input sequence) and perhaps predict on another one (again, only 1 input sequence). To summary: training data = (2, 2000, 1) validation data = (1, 2000, 1) prediction data = (1, ?, 1) (I say ? instead of 2000 because I am wondering whether I can only make prediction on a time sequence with the same length of training data; I tried using 2000 for the moment as I already have problems on the sample dimension). 2000 is the time sequence length I get from my simulation. BTW, I get an error as soon it encounters the validation data (I also tried to use the validation_split=0.2 instead of providing a separate dataset):

tensorflow.python.framework.errors_impl.InvalidArgumentError:  Specified a list with shape [2,1] from a tensor with shape [1,1]
     [[

I defined input_shape=(2000,1), as I read to do so in many posts. I am not using any complexity, just

model.add(LSTM(32, input_shape=input_shape, return_sequences=True, activation='sigmoid', batch_size=2))
model.add(Dense(1))

Another question is: how should I set my batch_size to warn the NN that I have more than one sample of data? As far as I understood, batch_size is related to the number of samples used to train the model before an update of hyper-parameters. Because I have only 2, I use them both for the training in 1 batch. There are 2 ways of defining batch_size: one is batch_size kwarg in model.add(LSTM(.., batch_size=xx)) the other one in model.fit(..., batch_size=2). What is the difference between the 2? Where should I specify the number of samples I want to use to train the model? If I don't specify it in LSTM function, I see the code running info: Train on 1 samples, validate on 1 samples and results are obviously bad (it knows only about 1 input sequence and the result doesn't match the system response). But if I put batch_size=2 in the model layer definition, then I cannot run with validation as I said. To clarify, If I remove validation data, put back batch_size=2 in LSTM and I make "prediction" on the same training set, I get good results. But it doesn't make sense. I want to predict on something else (use different data shape, at least for samples)!

I can share my code if anyone could help me.

Thanks

AleksandarHaber commented 4 years ago

Can you post the codes?

AleksandarHaber commented 4 years ago

Hi,

I cloned your repo and tried with a 2-unit SimpleRNN. I used the following model:

model = Sequential()
model.add(
    SimpleRNN(
        2,
        input_shape=(trainX.shape[1], trainX.shape[2]),
        use_bias=False,
        activation="linear",
        return_sequences=True,
    )
)
model.add(Dense(1, activation="linear", use_bias=False))

This model contains more than enough parameters to learn the dynamics. The results are very good.

responselstm32

Loss during training:

losslstm32

I tested this, and there are two important things to consider and to keep in mind: 1) In my original code, I used the training epochs of 2000. However, to obtain better results for networks with smaller number of units (parameters-model or model order), we need to increase the number of epochs. I obtained similar results to the results that cbow got by increasing the number of epochs to 4000. 2) Since the data generating model I used in my original code is a linear system, we need to explicitly state that activation functions are linear and that we do not use bias.

I revised my original post by polishing it a bit and by including these important insights. The revised post can be found here:

https://aleksandarhaber.com/using-recurrent-neural-networks-and-keras-tensorflow-to-learn-input-output-behaviour-of-dynamical-systems/