timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.15k stars 643 forks source link

how to directly access to the test dl in Learner #724

Closed whatactuallyis closed 1 year ago

whatactuallyis commented 1 year ago

hello,

I couldn't access the test data loader to evaluate the results after training. I was wondering how I could access the test dls if I split the data into three (train, validation, test) in the beginning by;

splits = get_splits(o=y,
                    valid_size=.2,
                    test_size=.1,
                    stratify=False,
                    random_state=43,
                    shuffle=False)

I get an error learn.get_preds(dl=test_ds) when I implement test_ds = learn.dls.valid.dataset.add_test(splits[2]) as "Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: []" however test_ds outputs as;

(#1742) [(TSTensor([15678], device=cpu, dtype=torch.int64),), (TSTensor([15679], device=cpu, dtype=torch.int64),), (TSTensor([15680], device=cpu, dtype=torch.int64),), (TSTensor([15681], device=cpu, dtype=torch.int64),), (TSTensor([15682], device=cpu, dtype=torch.int64),), (TSTensor([15683], device=cpu, dtype=torch.int64),), (TSTensor([15684], device=cpu, dtype=torch.int64),), (TSTensor([15685], device=cpu, dtype=torch.int64),), (TSTensor([15686], device=cpu, dtype=torch.int64),), (TSTensor([15687], device=cpu, dtype=torch.int64),)] ...]

oguiza commented 1 year ago

Hi @whatactuallyis, Thanks for sharing this. There was indeed a bug in the code no one has reported before. I've just fixed it in GitHub. If you install tsai from GitHub the following code snippet should work correctly:

X = np.random.rand(100, 2, 50) ## #samples x #features x #time steps
y = np.random.choice(["a", "b", "c", "d", "e"], len(X))
splits = get_splits(y, valid_size=0.1, test_size=0.2)
tfms = [None, TSClassification()]
batch_tfms = TSStandardize()
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
print(len(dls.train.dataset), len(dls.valid.dataset), len(dls[2].dataset)) # values should be 70,10,20
learn = ts_learner(dls, "InceptionTimePlus", metrics=accuracy, cbs=[ShowGraph()])
learn.fit_one_cycle(2, 1e-2)

I will create a new tsai release (0.3.6) within the next few days.

In fastai (and tsai) the first dataloader can be access using dls[0] or dls.train, the second is dls[1] or dls.valid. The rest (you can have as many as you need) don't have a specific name. You can access them using dls[2], ... So to get the predictions for the "test" split, you can use:

probas, targets = learn.get_preds(dl=dls[2])

It'd be good if you can confirm this works for you.

whatactuallyis commented 1 year ago

Thanks for the implementation. It works fine by my side. I assume I can directly look at any metric by just putting the results into the necessary function. for example, `metric(preds=probas, targs=targets).