philipperemy / cond_rnn

Conditional RNNs for Tensorflow / Keras.
MIT License
225 stars 32 forks source link

example real data #8

Closed hstarmans closed 3 years ago

hstarmans commented 4 years ago

Thanks for sharing your code. Could you provide an example on real data, with real I mean data from life measurements e.g. https://www.kaggle.com/sudalairajkumar/daily-temperature-of-major-cities . The example so far proofs you have working code, not that the model is superior. I can't pass all possible regularizers to the model, I think you could pass them at line 38 and 44 in your code . Here it could be nice to use things like kernel regularizers etc. Your code is compatible with tensorflow 2.2.0 RC. This should be changed in the requirements. Note, the only thing I could find with realistic examples is Modern Techniques for Forecasting Time Series. I am not active in the Kaggle competition; just wanted to give an example.

philipperemy commented 4 years ago

That's a good point. I'll try to provide an example with real data soon.

philipperemy commented 3 years ago

@hstarmans I posted an example here: https://github.com/philipperemy/cond_rnn/blob/master/examples/temp.py. It uses the Kaggle data. Note that the model is quite simple but it can do slightly better in practice. It would be interesting to build better models and see if we can decrease the loss even more. Let me know if you have more questions!

hstarmans commented 3 years ago

Great!! I am still working on similar projects and will have a look at the example you provided..

hstarmans commented 3 years ago

I have looked at your example & code:

I will try to add some more advanced methods like using the signature of the time series and an ARMAX model. They will be included later.

philipperemy commented 3 years ago

@hstarmans that's awesome!

Thanks! Let me know once you're done :)

philipperemy commented 3 years ago

Pushed to PyPi: https://pypi.org/project/cond-rnn/2.3/

hstarmans commented 3 years ago

I have used an ARIMAX model to predict the daily temperature in Amsterdam. As exogenous components, I used the five most correlating cities with a lag of one day. These happen to be neighboring cities like Brussels, Paris and London. I obtain a mean absolute error of 1.25 degrees with this model. I tried to add seasonal components or a small linear trend, to account for global warming, but this does not result in better models. I am able to spot a linear trend if I do a linear regression and sample on a daily basis. In ARIMAX it is not common to do a train test split. The model is very simple so less prone to over fitting. If needed, I might add this later for fair comparison but hope LSTM will simply outperform ARIMAX even though ARIMAX has more data. The analysis can be found here. I will try to improve the cond_rnn estimate in the coming week or weeks :-).

philipperemy commented 3 years ago

@hstarmans very cool!! Let me know how it goes :-)

hstarmans commented 3 years ago

Added cond_rnn, results are now as follows; ARMA model: Mean Absolute Error (MAE) of 1.47 degrees ARMAX model with 30 cities: MAE of 1.25 degrees Pure autoregressive LSTM model: MAE of 1.46 degrees Cond_rnn with 30 cities: MAE 0.87 degrees

For pure autoregressive problems, LSTM networks do not provide an advantage. First results show, however, that they are much better at taking a large set of external factors into account. The ARMAX model has problems as it not able to use GPUs and seems to be thrown off course due to multicollinearity A pdf of the results can be found here for LSTM and here for ARMAX. The code has been placed in my forked repo. @orko19 (added you as you might be interested)

philipperemy commented 3 years ago

@hstarmans awesome I will merge it to the main code base! Thanks for the results :)

philipperemy commented 3 years ago

I added a PR here: https://github.com/philipperemy/cond_rnn/pull/19 From your repo.