zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
MIT License
1.24k stars 191 forks source link

README example is failing with "RuntimeError: input.size(-1) must be equal to input_size" #63

Open vfdev-5 opened 3 years ago

vfdev-5 commented 3 years ago

Hi, I'm executing the following code from the README and

import pandas as pd
import matplotlib.pyplot as plt

import torch
print(torch.__version__)

import gluonts
from gluonts.dataset.common import ListDataset
from gluonts.dataset.util import to_pandas

import pts
from pts.model.deepar import DeepAREstimator
from pts import Trainer

print(pts.__version__, gluonts.__version__)

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()

training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)

device = "cpu"
estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data, num_workers=4)

and got the following error:

1.9.0
0.0.0-unknown 0.8.0

    203                     expected_input_dim, input.dim()))
    204         if self.input_size != input.size(-1):
--> 205             raise RuntimeError(
    206                 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
    207                     self.input_size, input.size(-1)))

RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19

Version:

pip list | grep pytorchts
pytorchts                     0.5.1

Any suggestions ? Thanks

kashif commented 3 years ago

opps i forgot to fix the readme... yes just use the input_size=19

vfdev-5 commented 3 years ago

@kashif thanks ! However, I do not get why input_size is not a hyperparameter and is constraint by something else ?

kashif commented 3 years ago

good question @vfdev-5 so the input_size is the size of the features which are then passed to the RNN for example. The feature size depends on the freq of the time series data (via lag features and time features), as well as on the possible static or time-dependent covariates of a dataset and finally on the user-defined possible embedding sizes from any possible categorical covariates... thus I found it easiest to just use the error message to get the right input size... another solution would be to get it from the final batches... but I hope that answers your question?

with respect to the RNN its hidden size is a hyperparameter and you can choose it as you like but there is some default value (which I forget)...

vfdev-5 commented 3 years ago

Thanks for a detailed answer @kashif ! Right, that's true that input_size for LSTM behind DeepAREstimator means the number of features: (batch_size, sequence_length, input_size), https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

The feature size depends on the freq of the time series data (via lag features and time features), as well as on the possible static or time-dependent covariates of a dataset and finally on the user-defined possible embedding sizes from any possible categorical covariates... thus I found it easiest to just use the error message to get the right input size... another solution would be to get it from the final batches... but I hope that answers your question?

It would be nice to see somewhere a clear description on how these parameters are coupled together.