wassname / attentive-neural-processes

implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)
Apache License 2.0
90 stars 23 forks source link
anp anp-rnn attention attentive-neural-processes neural-processes prediction pytorch rnn uncertainty

Neural Processes for Sequential Data

This repository houses an implementation of the "Recurrent Attentive Neural Process for Sequential Data" (ANP-RNN), tested on a toy regression problem and real smart meter data.

ANP-RNN Diagram

The repository provides options for running as an ANP-RNN, ANP, or NP.

Numerous tweaks have been made for flexibility and stability, including a replication of the DeepMind ANP results in PyTorch. This replication appears to be a better qualitative match than other PyTorch versions of ANP (as of 2019-11-01). See also section for other code repositories.

The code is not extensively documented due to limited usage. If you are using it and find it confusing, please raise a GitHub issue and we can enhance the documentation together.

Experiment: Comparing models on real world data

Here I compare the models on smartmeter power demand data.

The black dots are input data, the dotted line is the true data. The blue line is the prediction, and the blue shadow is the uncertainty to one standard deviation.

I chose a difficult example below, it's a window in the test set that deviates from the previous pattern. Given 3 days inputs, it must predict the next day, and the next day has higher power usage than previously. The trained model manages to predict it based on the inputs.

Results

Results on Smartmeter prediction (lower is better)

Model val_np_loss val_mse_loss
ANP-RNN(impr)(MCDropout) -1.48
ANP-RNN_imp -1.38 .00423
ANP-RNN -1.27 0.0047
ANP -1.3 0.0072
NP -1.3 0.0040
LSTM -0.78 0.0074

Example LSTM baseline

Here is an LSTM with a similar setup: it has access to the y value in the context (first half). It's output is inferier and it's uncertainty estimation if poor. It starts of high since it hasn't had much data yet, but it should increase, or at least stay high in the second half as it moves away from it's data.

Example NP

Here we see underfitting, since the curve doesn't match the data

Example ANP outputs (sequential)

Here we see overfitting, but the uncertainty seems to small, and the fit could be improved

Example ANP-RNN outputs

This has a better calibrated uncertainty and a better fit

Example of ANP-RNN with MCDropout

Experiment: Comparing models on toy 1d regression

I put some work into replicating the behaviour shown in the original deepmind tensorflow notebook. At the same time I compared multiple models.

Results

Results on toy 1d regression (lower is better)

model val_loss
ANP-RNN(impr) -1.3217
ANP-RNN -0.62
ANP -0.4228
ANP(impr) -0.3182
NP -1.2687

Example outputs

Compare deepmind:

And this repo with an ANP (anp_1d_regression.ipynb)

And a ANP-RNN

It's just a qualitative comparison but we see the same kind of overfitting with uncertainty being tight where lots of data points exist, and wide where they do not. However this repo seems to miss points occasionally.

Experiment: Using ANP-RNN + Monte Carlo Dropout

One more experiment is included:

The model tries to estimate the how unsure it is, but what about when it is out of sample? What about what it doesn't know that it doesn't know?

Name val_loss (n=100) [lower is better]
MCDropout -1.31
Normal -1.04

We can estimate additional uncertainty by using Monte Carlo Dropout to see how uncertain the model acts in the presence of dropout. This doesn't capture all uncertainty, but I found that is does improve (decrease) the validation loss. The loss is calculated by the negative overlap of the output distribution and the target value so this improvement in the loss shows that MCDropout improved the estimation of the uncertainty.

Why didn't the model just learn to be more uncertain? Well I choose a challenging train, val/test split where the val data was in the future and showed quite differen't behaviour. That means that the validation data had behaviour the model has never seen before.

With MCDropout:

Without

For more details see the notebook ./smartmeters-ANP-RNN-mcdropout.ipynb

Usage

  1. Clone this repository.

    git clone
    git-lfs pull
  2. Refer to requirements.txt for software requirements and versions.

  3. Run the notebook smartmeters.ipynb.

  4. For a toy 1D regression problem, refer to anp-rnn_1d_regression.ipynb.

Smartmeter Data

Code

The code is based on the code listed in the next section, with modifications for stability and to ensure it can handle future predictions. Notable changes include:

Changes for a sequential/predictive use case:

Changes for stability:

ANP-RNN diagram

Tips

See also:

A list of projects I used as reference or modified to make this one:

I'm very grateful for all these authors for sharing their work. It was a pleasure to dive deep into these models compare the different implementations.

Neural process papers:

Blogposts:

Citing

If you like our work and end up using this code for your reseach give us a shout-out by citing or acknowledging