timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
4.91k stars 622 forks source link

Questions about Standardization/Normalization preprocessing with tfms and batch_tfms #898

Open jeffabc1997 opened 2 months ago

jeffabc1997 commented 2 months ago

Hi, I'm new to timeseriesAI. Data should be normalized before we fit the model. For instance, I have data [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. [1, ..., 8] should be normalized, so 8 would be 1, 9 would be 1.143 after transformation. When I apply the code in tutorial notebook, does batch_tfms=[TSNormalize()] normalize all data(1, ....,8 in my case) or just normalize the data in 1 batch?

X, y = SlidingWindow(window_length, horizon=horizon, get_x=df.columns[:-1], get_y='label')(df)
splits = get_splits(y, n_splits=1, test_size=0.2,shuffle=False, check_splits=True)
tfms = [None, [Categorize()]]
dsets = TSDatasets(X, y, tfms = tfms, splits=splits, inplace=True)
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[2, 4], batch_tfms=[TSNormalize()], num_workers=0)

I did some research and someone says that batch_tfms let the GPU do the work, but I don't think I really understand the concept.

Another question is that how can I see the data after transformation? I use show_batch() but I don't know which batch is showing. Edit: using xb, yb = dls.train.one_batch()seems to see only 1 batch?

Thanks!