[FEATURE REQUEST] Can Merlion handle multi-series datasets?

salesforce / Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence

BSD 3-Clause "New" or "Revised" License

3.37k stars 298 forks source link

[FEATURE REQUEST] Can Merlion handle multi-series datasets? #52

Open tszumowski opened 2 years ago

tszumowski commented 2 years ago

This is more a question, but I didn't see a tag for it. Does Merlion support modeling multiple-series datasets? I understand from the README that it supports multi-variate models. I was curious to know if it supports multi-series. For example, consider this OJ Sales Dataset. In this case, the data contains weekly sales of orange juice over 121 weeks. There are 3,991 stores included and three brands of orange juice per store so that 11,973 models can be trained. I understand one can train independent models for each of the stores. However, I was interested in knowing if Merlion can take in data from multiple stores to learn correlations between them.

Another example of a multi-series dataset can be found in this article.

tszumowski commented 2 years ago

I saw in the Merlion paper on page 14 it mentions the Int_MF dataset which has:

21 time series
22 variables But the dataset is marked internal.

That sounds like an example of multi-series I'm interested in. Was Merlion run on that? If so, how was it configured?

aadyotb commented 2 years ago

Hi, thanks for your question @tszumowski. Merlion is already capable of supporting multivariate time series datasets for forecasting and anomaly detection. For forecasting, I suggest you check out ts_datasets.forecast.SeattleTrail. For anomaly detection, I suggest you check out ts_datasets.anomaly.MSL.

tszumowski commented 2 years ago

@aadyotb thank you for the reply. However, what you referenced is multi-variate, not multi-series. Some also cal it multi-instance, or multi-segment. Multi-series means there are multiple time-series as part of the scenario. You wish to forecast all time-series, but the underlying model may apply to all of the series in question. See my referenced scenario where one attempts to predict sales for many stores. In that case, there may be insufficient information to forecast sales for a single store in a silo, but aggregating across all stores (time-series) the model can learn to forecast given features applicable to all stores.

aadyotb commented 2 years ago

Ah, thanks for the clarification. In the paper, we actually train a separate model for each time series. In this case, the data loader is iterable as for time_series, metadata in loader: .... We currently don't support multi-series data as you describe it, though we may look into it in the future. Re-opening this issue due to earlier misunderstanding.

aadyotb commented 2 years ago

@tszumowski, @isenilov has added an initial version of multi-series training for the DAGMM model in #65. Does this roughly match your expectation? As we consider adding a more general version of this feature to our roadmap, your feedback would be welcome. Thanks!