StatMixedML commented 4 years ago

Description

I am currently using the Australian retail trade turnover data set to get familiar with PyTorch-TS in general and with TransformerTempFlowEstimator in particular. The data looks as follows:

aus_retail_df

Each series (133 series in total) has 417 months of training observations and is uniquely identified using two keys:

State: The Australian state (or territory)
Industry: The industry of retail trade

All series show quite some positive dependencies, as the correlation matrix shows:

cor_matr

As such, TransformerTempFlowEstimator seems to be a good option. I want to make use of both State and Industry as covariates in the model. For each categorical covariate, a generalized linear mixed model is fit to the outcome and the coefficients are returned as the encodings. The cardinality of State and Industry is [7, 20]. After bringing the data into the right format, I create the train data as follows:

train_ds = ListDataset([{FieldName.TARGET: target, 
                         FieldName.START: start,
                         FieldName.ITEM_ID: item_id,
                         FieldName.FEAT_DYNAMIC_REAL: feat_dynamic_real,
                         FieldName.FEAT_STATIC_REAL: feat_static_real,
                         FieldName.FEAT_TIME: time_feat
                        } 
                        for (target, 
                             start, 
                             item_id, 
                             feat_dynamic_real, 
                             feat_static_real, 
                             time_feat
                            ) in zip(target_train,
                                     start_train,
                                     item_id_train,
                                     feat_dynamic_real_train,
                                     feat_static_real_train,
                                     time_feat_train
                                    )],
                      freq = "1M")

feat_static_real_train contain the embeddings and time_feat_train the month information. To transform the data into a multivariate data set, I use

grouper_train = MultivariateGrouper(max_target_dim = 133) # as there are 133 unique series 
train_ds = grouper_train(train_ds)

However, after using grouper_train(train_ds), none of the covariate information is included anymore. To bring them back, I use

train_ds.list_data[0]["feat_dynamic_real"] = feat_dynamic_real_train
train_ds.list_data[0]["feat_static_real"] = feat_static_real_train

I then train the model as follows:

np.random.seed(123)
torch.manual_seed(123)
trainer = Trainer( epochs = 40) 

estimator = TransformerTempFlowEstimator(input_size = 401,
                                         freq = "1M", 
                                         prediction_length = 24,
                                         context_length = 48,
                                         target_dim = 133,
                                         cardinality = [7, 20],
                                         trainer = trainer)                              
predictor = estimator.train(training_data = train_ds)

The model summary is

predictor.__dict__["prediction_net"]*
pts.model.transformer_tempflow.transformer_tempflow_network.TransformerTempFlowPredictionNetwork(act_type="gelu", cardinality=[7, 20], conditioning_length=200, context_length=48, d_model=32, dequantize=False, dim_feedforward_scale=4, dropout_rate=0.1, embedding_dimension=5, flow_type="RealNVP", hidden_size=100, history_length=60, input_size=401, lags_seq=[1, 12], n_blocks=3, n_hidden=2, num_decoder_layers=3, num_encoder_layers=3, num_heads=8, prediction_length=24, scaling=True, target_dim=133)

I also compared the forecast to some competing models, even though I am not sure that all models are correctly specified (i.e., covariate information, no parameter tuning).

model_comp

Given the strong dependencies between the different series, I would suspect that TransformerTempFlowEstimator should outperform models that treat the series as being independent.

Question

Based on the above summary, I have the following questions concerning the proper use of TransformerTempFlowEstimator:

How can covariates be included, in particular categorical information.
Does the model automatically include, e.g., month and/or age information that it itself derives from the data or do we need to pass it using time_features in the function call.
Does the model automatically derive holiday information from the data, or do we need to derive it ourselves as described here.
Does the model automatically select an appropriate lag-structure from the data, or do we need to derive it ourselves as described here.

Which of the following field names are currently supported:

"FieldName.START = 'start'",
"FieldName.TARGET = 'target'",
"FieldName.FEAT_STATIC_CAT = 'feat_static_cat'",
"FieldName.FEAT_STATIC_REAL = 'feat_static_real'",
"FieldName.FEAT_DYNAMIC_CAT = 'feat_dynamic_cat'",
"FieldName.FEAT_DYNAMIC_REAL = 'feat_dynamic_real'",
"FieldName.FEAT_TIME = 'time_feat'",
"FieldName.FEAT_CONST = 'feat_dynamic_const'",
"FieldName.FEAT_AGE = 'feat_dynamic_age'",
"FieldName.OBSERVED_VALUES = 'observed_values'",
"FieldName.IS_PAD = 'is_pad'",
"FieldName.FORECAST_START = 'forecast_start'"]

kashif commented 4 years ago

Thanks for the question.

Categorical information at the moment is not used in the TempFlow model even though the estimator is initialised to get it. My reasoning was that for the multivariate versions of the open datasets end up being a single time series so I didnt see a need to distinguish the individual time series... does that make sense? Also things like age didn't make sense since it would be a vector of sorts...
Yes if you do not include any time features it makes the appropriate time features based on the frequency see the fourier_time_features_from_frequency_str bit in the estimator
holiday features are not automatically added
the lags are also not automatic but you need to provide the lags_seq to the estimator

I do not remember which fields are supported but I can check. Hope that helps!

StatMixedML commented 4 years ago

Categorical information at the moment is not used in the TempFlow model even though the estimator is initialised to get it. My reasoning was that for the multivariate versions of the open datasets end up being a single time series so I didnt see a need to distinguish the individual time series...

I see your point. I`d assume that incorporating categorical information like State and Industry in the above example adds additional context for the model, as series within the same State and Industry might be more related and that the model is able to pick it up if it is stated explicitly. Also, imagine we want to forecast a new State / Industry combination. Putting them into the right "bucket" might add to accuracy. I am not sure though to what extent this is already captured by learning the conditional density using normalizing flows. Also I am referring to your Paper, Section 4.2

We employ embeddings for categorical features (Charrington, 2018), which allows for relationships within a category, or its context, to be captured while training models. Combining these embeddings as features for time series forecasting yields powerful models like the first place winner of the Kaggle Taxi Trajectory Prediction challenge (De Brébisson et al., 2015).

Is there any other multivariate model available in PyTorch-TS that allows to incorporate categorical information?

Many thanks!

kashif commented 4 years ago

Here in the paper i was referring to a situation where we have different multi-variate timeseries or the time covariates can be embeddings rather than fourier features...

Hmm.. i think the best might be to try to add categorical embeddings to the DeepVAR and use the full multivariate normal output or low-rank to compare... You can have a look at the DeepAR on how the embedding layer is added...

StatMixedML commented 4 years ago

Here in the paper i was referring to a situation where we have different multi-variate timeseries.

I am not sure I fully understand what you are saying. May I ask you to clarify what different multi-variate timeseries means.

kashif commented 4 years ago

I mean't the situation where you have multivariate time series from say some system 1 and another from system 2 etc. (they have to be the same number...) in that case you could have categorical covariates... does that make sense?

StatMixedML commented 4 years ago

Let' see if I got it right: take the example from above where we have 133 different time series that are within State/Industry combinations. Would that be a single system?

Can you give an example of two different systems, maybe referring to the data set in your paper?

kashif commented 4 years ago

ah ok so you have 133 different multi-variate time series... then in that case if all the 133 time series have the same dimension then yes the categorical covariates will help.

StatMixedML commented 4 years ago

Yes, each series (133 different series in total) has 417 months of training observations (hence all have the same dimension) and is uniquely identified using two keys:

State: The Australian state (or territory)
Industry: The industry of retail trade

How much of an effort would it be for you to incorporate categorical covariate information into TransformerTempFlowEstimator?

StatMixedML commented 4 years ago

How much of an effort would it be for you to incorporate categorical covariate information into TransformerTempFlowEstimator?

Kindly asking for an update...

kashif commented 4 years ago

will try to get this working for deepvar first by today and then the others afterwards

kashif commented 4 years ago

@StatMixedML would you be able to test the issue-3 branch?

kashif commented 4 years ago

@StatMixedML would you be able to test the issue-3 branch?

StatMixedML commented 4 years ago

@kashif I'd be glad to, but potentially as of next week

StatMixedML commented 4 years ago

@kashif Re-reading our discussion, may I ask you to give a specific example (ideally including a data snippet) of your understanding of what a multivariate time series is? I am not sure we are on the same page :-)

So here is one definition:

A Multivariate time series has more than one time-dependent variable. Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values.

To be more specific, I am currently using the Australian retail trade turnover data set. Each series (133 univariate time-series) has 417 months of training observations and the data have the following columns:

State: The Australian state (or territory)
Industry: The industry of retail trade
Turnover: Retail turnover in $Million AUD
Date: Monthly Data (1982-04-01 to 2018-12-01)

The data look as follows in tabular format:

A subset plotted looks as follows:

So we have different univariate time-series combinations of State/Industry that constitute the multivariate data set. I`d assume that incorporating categorical information like State and Industry adds additional context for the model, as series within the same State and Industry might be more related and the model is able to pick it up if it is stated explicitly. Also, imagine we want to forecast a new State / Industry combination. Putting them into the right "bucket" might add accuracy.

As there is potentially some interdependence between some of the series, I believe DeepVAR and TransformerTempFlowEstimator are a good choice for modelling. I have seen that you've added support for use_feat_dynamic_real/use_feat_dynamic_cat/use_feat_static_cat to DeepVAR. Given that State/Industry doesn't change with time, I would start off and use use_feat_static_cat for testing.

StatMixedML commented 4 years ago

@kashif Just interested in your thoughts on https://github.com/zalandoresearch/pytorch-ts/issues/3#issuecomment-619511469

lorrp1 commented 3 years ago

@StatMixedML have you managed to get it working?

NielsRogge commented 3 years ago

@kashif could you please clarify how we can add additional covariate information (which is not defined in create_transformation), such as holiday information, and let the model learn embeddings from them?

Does TransformerTempFlowEstimatorsupport that?

Should we you just define them as feat_static_cat/feat_static_real/feat_dynamic_cat and add them when creating ListDataset?

kashif commented 3 years ago

@NielsRogge so the normalizing flow models which were for the paper and were run on the open datasets did not have additional holiday or lets say dynamic real features so I never added that to the model, however if you have a look at the DeepVAR model you can see how one can add both categorical as well as dynamic features to a multivariate model... If its something you want and cannot do without (e.g. by using deepVAR with some distribution emission) then let me know and i'll try to find some time to add that feature... hope that helps!

NielsRogge commented 3 years ago

Ok so if I understand correctly, the features defined in create_transformation of TransformerTempFlow are used by the model, but if I want to add additional covariate information I should use DeepVAR?

kashif commented 3 years ago

yes @NielsRogge try to get it working with deepVAR as it has support for these covariates like categorical and dynamic real features...

NielsRogge commented 3 years ago

Thank you for the quick reply.

For people wondering (quick summary):

the model you use itself already creates a bunch of covariates, which are defined in the create_transformation function of the model. For example, DeepVAREstimator already creates covariates such as Fourier time features (if you're not providing time_features yourself when initializing the model), age features, and observed values as seen here. Also, lagged features are created (as seen by the lag_seq variable), if you're not providing them yourself when initializing the model.
if you want to add additional covariates (such as holiday information or other dynamic real features), add them to your dataset objects as shown in section 1.3 of GluonTS' extended tutorial.
If you're using MultivariateGrouper to group the various time series, you have to add the features again after grouping (as shown above).
when initializing DeepVAREstimator, set use_feat_dynamic_real/use_feat_static_cat/use_feat_static_real to True (depending on which you are using). Also, if you're using categorical features, set cardinality, which is a list containing the number of unique values for each categorical feature, and embedding_dimension which is a list with embedding dimensions you want to use for each of the additional features. For each additional categorical feature you add, an embedding layer is created as seen here.

@kashif is there a reason use_dynamic_feat_cat is not supported in DeepVAR? Isn't holiday information a dynamic (i.e. time-dependent) categorical feature?

kashif commented 3 years ago

No, the holidays get converted to dynamic real features and depending on the kernel you use a particular date gets smoothed out so the model knows when a particular date is approaching and has passed... the reason the dynamic cat is not used is because I never found a need for it yet... but as soon as I do I will add it...

zalandoresearch / pytorch-ts

How to incorporate covariate information using TransformerTempFlowEstimator #3

Description

Question