timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.21k stars 651 forks source link

TSDatasets does not work with torch.Tensor #655

Closed xcvil closed 1 year ago

xcvil commented 1 year ago

Hi TSAI contributors,

I would like to create my own datasets with torch.Tensor data:

for iter, (data, label) in enumerate(dataloader):
    if iter == 0:
        X = data
        y = label
    else:
        X = torch.concat([X, data], dim=0)
        y = torch.concat([y, label])

splits = get_splits(y, valid_size=.2, stratify=True, random_state=23, shuffle=True)

and after check_data(X, y, splits), I got

X      - shape: [1472 samples x 3 features x 480 timesteps]  type: Tensor  dtype:torch.float64  isnan: 0
y      - shape: torch.Size([1472])  type: Tensor  dtype:torch.int64  isnan: 0
splits - n_splits: 2 shape: [1178, 294]  overlap: False

Then I run

tfms = [None, [Categorize()]]
dsets = TSDatasets(X, y, tfms=tfms, splits=splits, inplace=True)

and got

KeyError                                  Traceback (most recent call last)
File ~/miniconda3/lib/python3.8/site-packages/fastai/data/transforms.py:261, in Categorize.encodes(self, o)
    260 try:
--> 261     return TensorCategory(self.vocab.o2i[o])
    262 except KeyError as e:

KeyError: tensor(0)

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/Users/xiaochen/Project/icudelir_benchmarks/nbs/Baseline_Methods/Baseline_Data_Preparation.ipynb Cell 28 in <cell line: 2>()
      1 tfms = [None, [Categorize()]]
----> 2 dsets = TSDatasets(X, y, tfms=tfms, splits=splits, inplace=True)

File ~/miniconda3/lib/python3.8/site-packages/tsai/data/core.py:422, in TSDatasets.__init__(self, X, y, items, sel_vars, sel_steps, tfms, tls, n_inp, dl_type, inplace, **kwargs)
    420     self.tls = L(lt(item, t, **kwargs) for lt,item,t in zip(lts, items, self.tfms))
    421     if len(self.tls) > 0 and len(self.tls[0]) > 0:
--> 422         self.typs = [type(tl[0]) if isinstance(tl[0], torch.Tensor) else self.typs[i] for i,tl in enumerate(self.tls)]
    423     self.ptls = L([typ(stack(tl[:]))[...,self.sel_vars, self.sel_steps] if (i==0 and self.multi_index) else typ(stack(tl[:])) \
    424                     for i,(tl,typ) in enumerate(zip(self.tls,self.typs))]) if inplace else self.tls
    425 else:

File ~/miniconda3/lib/python3.8/site-packages/tsai/data/core.py:422, in <listcomp>(.0)
...
    261     return TensorCategory(self.vocab.o2i[o])
    262 except KeyError as e:
--> 263     raise KeyError(f"Label '{o}' was not included in the training dataset") from e

KeyError: "Label '0' was not included in the training dataset"

My y looks like tensor([0, 0, 0, ..., 1, 1, 1]).

If I add

X = X.numpy().astype(np.float64)
y = y.numpy().astype(np.int64)

everything works!

Thanks a lot!

oguiza commented 1 year ago

Hi @xcvil , tsai can work with tensors in some cases, but it's always safer to use arrays as inputs. So your proposed solution looks fine to me.

X = X.numpy().astype(np.float64)
y = y.numpy().astype(np.int64)
oguiza commented 1 year ago

Closing this issue due to lack of activity and progress. If necessary please, create a new one.