keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.7k stars 19.43k forks source link

Simple RNN for system identification #19018

Open apasagic opened 9 months ago

apasagic commented 9 months ago

As I have described in the following issue on Stack Exchange, I fail to get good results using SimpleRNN to capture very basic 2-state linear system dynamics, which is surprising since the two should essentially be equivalent. (2-unit RNN with linear outputs is somewhat equivalent to a second order state-space model)

https://datascience.stackexchange.com/questions/126262/using-simple-rnn-to-identify-a-simple-dynamic-linear-system

As I described in the post, I simulate the system using the data gathered via backwards-euler simulation of the state-space system using vector of Input Force at each timestep as input vector and vector of Displacement at each timestep as an output vector.

Given my understanding of how RNNs work I would expect that this should suffice and that current state of X in time t depends on previous states of X in times t-1 and t-2, similarly how velocity and position are calculated implicitly (or rather - internally) from input force as states in continuous model and should not be fed as inputs into the RNN.

As soon as I understand how to get Jupyter Notebook running on github I will post the whole code too.

SuryanarayanaY commented 9 months ago

@apasagic , Awaiting your code snippet. You may test the code on google colab and can attach it as colab gist here.

apasagic commented 9 months ago

@apasagic , Awaiting your code snippet. You may test the code on google colab and can attach it as colab gist here.

Thank you very much for your reply SuryanarayanaY.

Here is the code: https://colab.research.google.com/drive/14uvYV8CWPT1_9BNqplYcXt5-8XdeU2sH?usp=sharing

SuryanarayanaY commented 9 months ago

Hi @apasagic ,

With a single layer its difficult to converge the model and you may not able to converge the model to global minima at alleven for any number of epcohs.You may also need to vary parameters like increasing the no of layers or neurons, lower the learning rate further and change the activation to relu to make the model to learn non linear patterns etc. I did some changes and the performance improved a little bit as per attached gist.

apasagic commented 8 months ago

Thank you very much Suryanarayana.

Interesting thing about this is that

a.) this is a linear system, described by state-space matrices. I come from control system background, so this is classic way how we describe Linear systems.

b.) Interestingly enough, this type of describing the system by having states from previous step, multiplied with a tranformation matrix and added with tranformed input (also multiplied by a matrix), is in fact identical to RNN structure shown below, which is easy to see by comparison:

vid-fast

Note (image below shows state-space description for continous system, but discrete is expressed as X(t+1) = AX(t) + Bu(t) images

This will only hold true however, when the activation function is linear, hence there is no activation function so to speak.

In that sense, modeled system from which simulated data was obtained is fully linear, has 2 states and in fact has a structure that should be fully identical to the structure of a 2-unit RNN with linear activation function. This is why I am very surprised that identification fails to converge.

Could you perhaps help me with interpreting the data given in the model.get_weights(). That was I may peak into what is happening in the model itself and perhaps understand better where the fault is.

Thank you again and wish you a pleasant day.