nengo / keras-lmu

Keras implementation of Legendre Memory Units
https://www.nengo.ai/keras-lmu/
Other
207 stars 35 forks source link

The Mackey Glass experiment seems to be predicting the past #14

Open creativedutchmen opened 4 years ago

creativedutchmen commented 4 years ago

Could it be that in the Mackey Glass experiment, the network is asked to approximate a 15-step delay, instead of simulating a complex system?

The definition of the X and Y data is as follows:

Y = X[:, :-predict_length, :]
X = X[:, predict_length:, :]

This implies the X data starts at tilmestep predict_length (15) and ends at the end of the series. Y starts at t=0 and runs until 15 steps before the end of the series. From what I understand, this means the network has seen all the values it is takes to predict beforehand. Is this intentional?

arvoelke commented 4 years ago

Hi @creativedutchmen. Thanks for reviewing the code and bringing this to our attention. I think what happened is that, at one point, I was testing the network's ability to compute a delay line, and forgot to change it back. It's a shame that this wasn't noticed until now, especially since yesterday night was the cutoff for making final changes to the paper.

Fortunately this doesn't affect the nature of the results. This isn't too surprising since the future state of a strange attractor can be predicted using a delay embedding (Takens, 1981). In other words, the history of the time-series (a.k.a., delays) is useful to predict its future. In fact, the MG dataset is generated by a delay differential equation. This recapitulates the point that it's difficult for LSTMs to learn pure delays.

Updated results:

Model Test NRMSE
LSTM 0.05872
LMU 0.04873
Hybrid 0.04473

I've pushed a branch fix-mackey-glass with these corrected results and the trained model files. The notebook is here for the time being: https://github.com/abr/neurips2019/blob/fix-mackey-glass/experiments/mackey-glass.ipynb (note: the simulation times are on CPU rather than GPU). Only two changes were made to the notebook.

The first change is to the data generation:

    Y = X[:, predict_length:, :]
    X = X[:, :-predict_length, :]

The second change is to the LMU layer construction:

    hidden_kernel_initializer='uniform',
    include_bias=True,

The default initializers for the LMUCell should be taken with a grain of salt. Using the uniform initializer instead starts the network off with a much more reasonable error for this task. We also found that a trainable bias vector is needed in the hidden layer for this task.

Let me know if you have any thoughts. Thanks again, and I am currently running a couple more experiments and discussing with my coauthors how to review this mistake and move forward with publishing a correction.