Multivariate time series data

time-series-foundation-models / lag-llama

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Apache License 2.0

1.24k stars 154 forks source link

Multivariate time series data #65

Open onchiptech opened 5 months ago

onchiptech commented 5 months ago

How to pre-train the lag-llama model with multivariate time series data?

For example:

num_time_steps = 300 data = [ { "start": pd.Timestamp("2020-01-01", freq="D"), "target": np.random.randn(2, num_time_steps), # Two fields: temperature and humidity } ]

dataset = ListDataset(data, freq="D", one_dim_target=False)

RikiSot commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df

target  item_id
2021-01-01 00:00:00 -1.3378 A
2021-01-01 01:00:00 -1.6111 A
2021-01-01 02:00:00 -1.9259 A
2021-01-01 03:00:00 -1.9184 A
2021-01-01 04:00:00 -1.9168 A
... ... ...
2021-01-10 19:00:00 1.2349  J
2021-01-10 20:00:00 1.1525  J
2021-01-10 21:00:00 1.1485  J
2021-01-10 22:00:00 1.3248  J
2021-01-10 23:00:00 1.1657  J

Check the Gluonts documentation for more info

CoCoNuTeK commented 5 months ago

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

CoCoNuTeK commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df

target    item_id
2021-01-01 00:00:00   -1.3378 A
2021-01-01 01:00:00   -1.6111 A
2021-01-01 02:00:00   -1.9259 A
2021-01-01 03:00:00   -1.9184 A
2021-01-01 04:00:00   -1.9168 A
...   ... ...
2021-01-10 19:00:00   1.2349  J
2021-01-10 20:00:00   1.1525  J
2021-01-10 21:00:00   1.1485  J
2021-01-10 22:00:00   1.3248  J
2021-01-10 23:00:00   1.1657  J

Check the Gluonts documentation for more info

So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

onchiptech commented 5 months ago

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.

ashok-arjun commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1
import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target  item_id
2021-01-01 00:00:00 -1.3378 A
2021-01-01 01:00:00 -1.6111 A
2021-01-01 02:00:00 -1.9259 A
2021-01-01 03:00:00 -1.9184 A
2021-01-01 04:00:00 -1.9168 A
... ... ...
2021-01-10 19:00:00 1.2349  J
2021-01-10 20:00:00 1.1525  J
2021-01-10 21:00:00 1.1485  J
2021-01-10 22:00:00 1.3248  J
2021-01-10 23:00:00 1.1657  J
Check the Gluonts documentation for more info
So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.

The current model unfortunately only supports taking as input as the same variable to be predicted. It does not allow external covariates (other variables) at the moment.

ashok-arjun commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df

target    item_id
2021-01-01 00:00:00   -1.3378 A
2021-01-01 01:00:00   -1.6111 A
2021-01-01 02:00:00   -1.9259 A
2021-01-01 03:00:00   -1.9184 A
2021-01-01 04:00:00   -1.9168 A
...   ... ...
2021-01-10 19:00:00   1.2349  J
2021-01-10 20:00:00   1.1525  J
2021-01-10 21:00:00   1.1485  J
2021-01-10 22:00:00   1.3248  J
2021-01-10 23:00:00   1.1657  J

Check the Gluonts documentation for more info

You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

CoCoNuTeK commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1
import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target    item_id
2021-01-01 00:00:00   -1.3378 A
2021-01-01 01:00:00   -1.6111 A
2021-01-01 02:00:00   -1.9259 A
2021-01-01 03:00:00   -1.9184 A
2021-01-01 04:00:00   -1.9168 A
...   ... ...
2021-01-10 19:00:00   1.2349  J
2021-01-10 20:00:00   1.1525  J
2021-01-10 21:00:00   1.1485  J
2021-01-10 22:00:00   1.3248  J
2021-01-10 23:00:00   1.1657  J
Check the Gluonts documentation for more info
So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.
The current model unfortunately only supports taking as input as the same variable to be predicted. It does not allow external covariates (other variables) at the moment.

I tought given its based on the GluonTS library it might also work with covariates, thanks for the info as i was about to start my preprocessing for covariate forecasting, saved me time.... Is there intention in the future to build a newer model/paper that will allow covariates + model trained on more time series? As transformers are great because you can use the missing values layer aswell so preprocessing is way easier as there is no need to impute values.

ashok-arjun commented 5 months ago

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model.

But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

onchiptech commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1
import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target  item_id
2021-01-01 00:00:00 -1.3378 A
2021-01-01 01:00:00 -1.6111 A
2021-01-01 02:00:00 -1.9259 A
2021-01-01 03:00:00 -1.9184 A
2021-01-01 04:00:00 -1.9168 A
... ... ...
2021-01-10 19:00:00 1.2349  J
2021-01-10 20:00:00 1.1525  J
2021-01-10 21:00:00 1.1485  J
2021-01-10 22:00:00 1.3248  J
2021-01-10 23:00:00 1.1657  J
Check the Gluonts documentation for more info
You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

CoCoNuTeK commented 5 months ago

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model.

But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

That means the benchmarks with the other models were done on univariate predictiosn where tis basically testing how effectively can the models capture teh cyclic dependencies of the target variable without any further covariate information?

CoCoNuTeK commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1
import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target    item_id
2021-01-01 00:00:00   -1.3378 A
2021-01-01 01:00:00   -1.6111 A
2021-01-01 02:00:00   -1.9259 A
2021-01-01 03:00:00   -1.9184 A
2021-01-01 04:00:00   -1.9168 A
...   ... ...
2021-01-10 19:00:00   1.2349  J
2021-01-10 20:00:00   1.1525  J
2021-01-10 21:00:00   1.1485  J
2021-01-10 22:00:00   1.3248  J
2021-01-10 23:00:00   1.1657  J
Check the Gluonts documentation for more info
You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.
Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

You can only load multivariate time series, which means that its still just one variable so no covariates but you can put multiple time series inside of one df for training, but thats different from being able to predict target using covariates as was explained by @ashok-arjun

ashok-arjun commented 5 months ago

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1
import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target    item_id
2021-01-01 00:00:00   -1.3378 A
2021-01-01 01:00:00   -1.6111 A
2021-01-01 02:00:00   -1.9259 A
2021-01-01 03:00:00   -1.9184 A
2021-01-01 04:00:00   -1.9168 A
...   ... ...
2021-01-10 19:00:00   1.2349  J
2021-01-10 20:00:00   1.1525  J
2021-01-10 21:00:00   1.1485  J
2021-01-10 22:00:00   1.3248  J
2021-01-10 23:00:00   1.1657  J
Check the Gluonts documentation for more info
You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.
Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

Yes, it would miss out on inter-variable information. But if you ultimately only care about forecasts, you might get great forecasts from just univariate models (which is what a lot of papers show). I'd recommend trying it out anyway.

Yes, that is one idea if you really want to consider inter-variable information. Yes, that would work seamlessly with lags as the "lags" we consider don't rely on the frequency itself; lags of many possible frequencies are considered.

Still, I'd recommend first trying to forecast variables independently and benchmarking with that method, so you can check if the inter-variable information increases forecast accuracy a lot or not.

ashok-arjun commented 5 months ago

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model. But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

That means the benchmarks with the other models were done on univariate predictiosn where tis basically testing how effectively can the models capture teh cyclic dependencies of the target variable without any further covariate information?

Yes, that is correct. We limited the scope of this paper to that. But I agree that there's so much more that can be done, which I expect we'll see in future work:)