timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.15k stars 643 forks source link

Is using continuous and categorical features with ts_learner possible? If not, how can window_len in get_tabular_dls be set? #766

Open CMobley7 opened 1 year ago

CMobley7 commented 1 year ago

My apologies for the dumb question. I have a target variable and continuous and categorical features in a dataframe. The categorical features are dynamic. I'd like to train a time series model, such as the TSTPlus, on a sliding window of these features that doesn't include the target. I plan to test this out with a categorical and continuous target, but the examples below assume a categorical target. Unfortunately, I'm struggling to ascertain how to do this.

splits = TimeSplitter(valid_size=0.15)(df_classification.index)
procs = [Categorify, FillMissing, Normalize]

to = get_tabular_ds(
    df_classification,
    procs=procs,
    cat_names=cat_names,
    cont_names=cont_names,
    y_names="triggers",
    splits=splits,
)

dls = to.dataloaders(bs=64, seq_len=20, seq_first=True)

class_weights = compute_class_weight("balanced", classes=[-1, 0, 1], y=to.train.y)
class_weights = torch.tensor(class_weights, dtype=torch.float32)

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)
X, y = apply_sliding_window(df_classification, window_len=20, horizon=0, x_vars=slice(None, -1), y_vars=-1)

class_weights = compute_class_weight("balanced", classes=[-1, 0, 1], y=y)
class_weights = torch.tensor(class_weights, dtype=torch.float32)

splits = TimeSplitter(valid_size=0.15)(y)
tfms  = [None, [TSClassification()]]
batch_tfms = [TSStandardize(by_sample=False, by_var=True)]
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms, inplace=False)

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)

Using apply_sliding_window with window_len=20and get_ts_dls appears to get a data loader with the right window size but assumes continuous variables, while get_tabular_ds allows categorical variables but doesn't have window _len parameters. I tried converting it to a dataloader with dls = to.dataloaders(bs=64, seq_len=20, seq_first=True), but I couldn't tell if this applied the desired window, and it caused the following error when running

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)

I get the following error:

AttributeError                            Traceback (most recent call last)
Cell In[43], line 1
----> 1 learn = ts_learner(
      2     dls,
      3     TSTPlus,
      4     metrics=[F1Score(average="macro")],
      5     loss_func=CrossEntropyLossFlat(weight=class_weights),
      6     lr=1e-4,
      7 )

File [~/.../site-packages/tsai/learner.py:549), in ts_learner(dls, arch, c_in, c_out, seq_len, d, splitter, loss_func, opt_func, lr, cbs, metrics, path, model_dir, wd, wd_bn_bias, train_bn, moms, train_metrics, valid_metrics, **kwargs)
    547     if arch is None: arch = InceptionTimePlus
    548     elif isinstance(arch, str): arch = get_arch(arch)
--> 549     model = build_ts_model(arch, dls=dls, c_in=c_in, c_out=c_out, seq_len=seq_len, d=d, **kwargs)
    550 if hasattr(model, "backbone") and hasattr(model, "head"):
    551     splitter = ts_splitter

File [~/.../site-packages/tsai/models/utils.py:147), in build_ts_model(arch, c_in, c_out, seq_len, d, dls, device, verbose, pretrained, weights_path, exclude_head, cut, init, arch_config, **kwargs)
    145 device = ifnone(device, default_device())
    146 if dls is not None:
--> 147     c_in = ifnone(c_in, dls.vars)
    148     c_out = ifnone(c_out, dls.c)
    149     seq_len = ifnone(seq_len, dls.len)
...
    172 res = [t for t in att.attrgot(k) if t is not None]
--> 173 if not res: raise AttributeError(k)
    174 return res[0] if len(res)==1 else L(res)

AttributeError: vars

How can I accomplish what I wrote above, if possible? I've thought of some inelegant and nonideal solutions, such as using a ts_learner but dropping the categorical features entirely, using a tabular_learner without a window length, using a tabular_learner, but creating a function that takes window_len and appends the features to the original dataframe, such as feature_1(ts-1) to feature_X(ts-window_len). These are obviously nonideal solutions, and I'd rather use a ts_learner though I plan to test out tabular learners in the future; so, learning how to set a window_len in that would be awesome to know as well.

https://github.com/timeseriesAI/tsai/issues/231 seems to indicate that this is possible now, but I didn't see any example code.

Awe42 commented 1 year ago

Hey @CMobley7 did you ever figure this out? I would also love to see an example of how the fix in https://github.com/timeseriesAI/tsai/issues/231 should be used.

Also where is it mentioned that get_ts_dls assumes continuous variables? Do you know if the same assumption is made when using TSDataLoaders.from_dsets?

Awe42 commented 1 year ago

For future reference, I found an example here and here.

CMobley7 commented 1 year ago

Sorry for the delay, @Awe42. I'd previously looked at the links you provided. While the MultiInputNet with get_mixed_dls would allow you to use a time series and tabular models together, I still don't yet see a way to use both continuous and categorical features with a sliding window. The get_tabular_ds function takes a dataframe, not the X and y arrays generated by apply_sliding_window though you could create a function to apply the sliding window to a dataframe and recreate the dataframe with the additional features created by apply_sliding_window, such as feature_1(ts-1) to feature_X(ts-window_len). While this would allow you to use the tabular models in Tsai, I still don't see a way to use the time series models with categorical data as they appear to only work with continuous data. So, you could use either just a tabular model or a MultiInputNet with a time series model with just the continuous data and the tabular model with both as mentioned before. However, based on https://github.com/timeseriesAI/tsai/issues/231, it should be possible to use categorical variables with at least a few of the time series models, though I haven't dug deep enough in the source code to see how that could be done. Did you figure out a better way, @Awe42? @oguiza, is there an example or gist of using a time series model with both categorical and continuous features somewhere, and is there already a function in Tsai that allows one to apply a sliding window but output a dataframe instead of X and y arrays, along with lists of the new categorical and continuous column names for use with tabular models?

oguiza commented 1 year ago

Hi @CMobley7, @Awe42, I'm currently testing some new functionality I've recently added to tsai. It's in a module called tsai.models.multimodal. It will allow you to use:

You may want to test it as well with your own data. I plan to create a tutorial if I find the tests work out well.

cjsombric commented 4 months ago

@oguiza Have you created a tutorial around the new tsai.models.multimodal module? Or is there another update on this thread?