salesforce / Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence
BSD 3-Clause "New" or "Revised" License
3.36k stars 295 forks source link

Adding Deep Learning Support #134

Closed yihaocs closed 1 year ago

yihaocs commented 1 year ago

A major shortcoming of Merlion's forecasting module is a lack of support for deep models that have been popular in the research literature. This PR addresses this gap by adding basic primitives which are shared by most deep learning models, and implementing the recently proposed DeepAR, Informer, Autoformer, and ETSFormer models.

salesforce-cla[bot] commented 1 year ago

Thanks for the contribution! Before we can merge this, we need @yihaocs to sign the Salesforce.com Contributor License Agreement.

chenghaoliu89 commented 1 year ago

Thanks for the great contributions Yihao! I've left a number of comments inline. I haven't looked in detail at the actual model/layer implementations. Maybe @chenghaoliu89 would be a better person for that part.

One issue that I did want to discuss independently is that of target_seq_index. It seems quite clear to me that we do want to support multivariate outputs. So instead of overriding train_pre_process() in DeepForecasterBase, can you do the following?

  1. Add a property supports_multivariate_output to ForecasterBase, which is False by default.
  2. supports_multivariate_output is overridden to True for deep forecasters, and I can go back to set it to True for other relevant models as well.
  3. Update the base class train_pre_process to accommodate multivariate outputs.

Hi @aadyotb, @gorold will help us review this PR and further integrate DeepTIME into Merlion. Thanks Gerald and yihao!

aadyotb commented 1 year ago

Discussing some of the points @gorold brought up:

Model inputs are very specific to In/Autoformer. In particular the following code snippet should follow a more general formulation if we want to include more models which do not require decoder inputs. One way to overcome this issue, is to defer this portion of the code to the model. Each model simply takes as input the lookback window and covariates, and can construct these decoder inputs in their own forward call. (We also do not really need to distinguish between past and future time-varying covariates, the model can handle this).

I agree that the decoder inputs should be constructed by the individual models in their own forward() calls. The use of a zero-vector for inference doesn't seem like a universal behavior when future isn't provided. That said, I think that distinguishing between past/future is useful for making code more modular & understandable at the level of the base class. As an aside, @yihaocs note that start_token = past[:, -config.start_token_len :] won't work as desired when start_token_length is 0. For the correct behavior, you'd need to (1) check that past is long enough, and (2) do past[:, len(past) - config.start_token_len :].

The current RollingWindowDataset does not support global models (i.e. learning from multiple time-series), which may be something we want to support. This would involve taking as input a List of TimeSeries, and each sample would also (optionally) yield an index/categorical indicating which time-series the window belongs to.

The plan is to add this functionality in a subsequent PR.

yihaocs commented 1 year ago

Thanks for the great suggestion. I changed the API of the model forward call; now we just put (past, past_timestamp, future, future_timestamp) into the forward function. Let me know if you have any further comments about this part @gorold @aadyotb

Thanks @aadyotb for letting me know about the bug. I have fixed it.

Discussing some of the points @gorold brought up:

Model inputs are very specific to In/Autoformer. In particular the following code snippet should follow a more general formulation if we want to include more models which do not require decoder inputs. One way to overcome this issue, is to defer this portion of the code to the model. Each model simply takes as input the lookback window and covariates, and can construct these decoder inputs in their own forward call. (We also do not really need to distinguish between past and future time-varying covariates, the model can handle this).

I agree that the decoder inputs should be constructed by the individual models in their own forward() calls. The use of a zero-vector for inference doesn't seem like a universal behavior when future isn't provided. That said, I think that distinguishing between past/future is useful for making code more modular & understandable at the level of the base class. As an aside, @yihaocs note that start_token = past[:, -config.start_token_len :] won't work as desired when start_token_length is 0. For the correct behavior, you'd need to (1) check that past is long enough, and (2) do past[:, len(past) - config.start_token_len :].

The current RollingWindowDataset does not support global models (i.e. learning from multiple time-series), which may be something we want to support. This would involve taking as input a List of TimeSeries, and each sample would also (optionally) yield an index/categorical indicating which time-series the window belongs to.

The plan is to add this functionality in a subsequent PR.