zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend
MIT License
1.21k stars 190 forks source link

Multivariate Target Dim errors #26

Closed lorrp1 closed 3 years ago

lorrp1 commented 3 years ago

Hello, im trying to use the dataset here https://github.com/smallGum/MLCNN-Multivariate-Time-Series/blob/master/data/nasdaq100_padding.csv to train TransformerTempFlowEstimator but i keep getting error related to the target_dim, here im using only 2 columns:

df = pd.read_csv ("./data/nasdaq100_padding.csv")

leng = len(df.NDX)

train = int(leng/2)
test = int(leng/2)
prediction_length = 15

training_data1 = ListDataset(
    [{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AAPL[:train]},
     {"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AMZN[:train]}
    ],
    one_dim_target=False,
    freq = "min"
)
device = torch.device("cuda" )
estimator = TransformerTempFlowEstimator (freq="min", 
                            prediction_length=prediction_length,
                            input_size=600,
                            target_dim = 2,                          
                            trainer=Trainer(epochs=15,
                                            #learning_rate = 0.00001,
                                            device=device, 
                                            num_batches_per_epoch=500, 
                                            batch_size=20))
predictor = estimator.train(training_data=training_data1)

im getting errors like: RuntimeError: Sizes of tensors must match except in dimension 0. Got 20 and 10 (The offending index is 0) (which usually works by changing target_dim but then i get:) and: RuntimeError: shape '[-1, 30, 3]' is invalid for input of size 600

kashif commented 3 years ago

@lorrp1 so the issue is I think you need to have a multivariate dataset for the multivariate methods, i.e. the target has to be a 2-dim array of time and variates with the one-dim flag set to False...

Also note that normalizing flows work best when you have high dim multivariate time series and not just 2 as in your case...

lorrp1 commented 3 years ago

but is not this already a multivariate dataset? @kashif

training_data1 = ListDataset(
    [{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AAPL[:train]},
     {"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AMZN[:train]}
    ],
    one_dim_target=False,
    freq = "min"
)

i cant find example with this kind of data (using multivariate from csv) either here or gluons, all the example use -pre made dataset with metadata unlike im trying here. i have tried with 5 variates but the result is the same.

kashif commented 3 years ago

so @lorrp1 you want in your example above a single time series with "target": np.stack( APP , AMZN ) if that makes sense... which is what the mutivariate grouper is doing...

what you have above is essentially two univariate time series...