zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
MIT License
1.21k stars 190 forks source link

How to plot forecasts of multivariate time series #25

Closed NielsRogge closed 3 years ago

NielsRogge commented 3 years ago

I'd like to plot the predictions of the TempFlowEstimator on a multivariate time series dataset, similar to what is done in the README of this repository.

When I make the forecasts as follows:

from pts.evaluation import make_evaluation_predictions
from pts.evaluation import MultivariateEvaluator
import numpy as np

evaluator = MultivariateEvaluator(quantiles=(np.arange(20)/20.0)[1:],
                                  target_agg_funcs={'sum': np.sum})

forecast_it, ts_it = make_evaluation_predictions(dataset=dataset_test,
                                             predictor=predictor,
                                             num_samples=100)
forecasts = list(forecast_it)
targets = list(ts_it)

targets[0] is a Pandas dataframe containing the true values for each of the (in my case 12) time series for all time steps. forecasts[0] is a SampleForecastobject whose samples is a Numpy array of shape (100, 365, 12). This means that we have 100 samples for each of the 365 time steps of the test set, for each of the 12 time series.

However, how can I plot the samples of the first time series for example? I tried to set the samples of the forecasts[0] object to the samples of the first series (i.e. forecasts[0].samples = forecasts[0].samples[:,:,0]), but when I call the plot function on that I get


Exception                                 Traceback (most recent call last)
<ipython-input-83-b629071ff750> in <module>()
----> 1 samples_first_time_series.plot()

2 frames
/usr/local/lib/python3.6/dist-packages/pts/model/forecast.py in plot(self, prediction_intervals, show_mean, color, label, output_file, *args, **kwargs)
    132 
    133         p50_data = ps_data[i_p50]
--> 134         p50_series = pd.Series(data=p50_data, index=self.index)
    135         p50_series.plot(color=color, ls="-", label=f"{label_prefix}median")
    136 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional
kashif commented 3 years ago

@NielsRogge I plot some dimensions of a multivariate output using this helper function:

def plot(target, forecast, prediction_length, prediction_intervals=(50.0, 90.0), color='g', fname=None):
    label_prefix = ""
    rows = 3
    cols = 2
    fig, axs = plt.subplots(rows, cols, figsize=(24, 24))
    axx = axs.ravel()
    seq_len, target_dim = target.shape

    ps = [50.0] + [
            50.0 + f * c / 2.0 for c in prediction_intervals for f in [-1.0, +1.0]
        ]

    percentiles_sorted = sorted(set(ps))

    def alpha_for_percentile(p):
        return (p / 100.0) ** 0.3

    for dim in range(0, min(rows * cols, target_dim)):
        ax = axx[dim]

        target[-2 * prediction_length :][dim].plot(ax=ax)

        ps_data = [forecast.quantile(p / 100.0)[:,dim] for p in percentiles_sorted]
        i_p50 = len(percentiles_sorted) // 2

        p50_data = ps_data[i_p50]
        p50_series = pd.Series(data=p50_data, index=forecast.index)
        p50_series.plot(color=color, ls="-", label=f"{label_prefix}median", ax=ax)

        for i in range(len(percentiles_sorted) // 2):
            ptile = percentiles_sorted[i]
            alpha = alpha_for_percentile(ptile)
            ax.fill_between(
                forecast.index,
                ps_data[i],
                ps_data[-i - 1],
                facecolor=color,
                alpha=alpha,
                interpolate=True,
            )
            # Hack to create labels for the error intervals.
            # Doesn't actually plot anything, because we only pass a single data point
            pd.Series(data=p50_data[:1], index=forecast.index[:1]).plot(
                color=color,
                alpha=alpha,
                linewidth=10,
                label=f"{label_prefix}{100 - ptile  * 2}%",
                ax=ax,
            )

    legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]    
    axx[0].legend(legend, loc="upper left")
    axx[0].set_title(forecast.item_id)

    if fname is not None:
        plt.savefig(fname, bbox_inches='tight', pad_inches=0.05)

hope this helps!