Question regarding the TSStandardize(by_sample=True, by_var=True)

timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai

https://timeseriesai.github.io/tsai/

Apache License 2.0

5.21k stars 651 forks source link

Question regarding the TSStandardize(by_sample=True, by_var=True) #603

Closed zmce2018 closed 1 year ago

zmce2018 commented 2 years ago

Hi Team,

I have found the standardized input data from the data loader is very different from standardized input data using (x-np.mean(x))/(np.std(x)). Could you please advise why there is a difference?

Also, Is there any chance of retrieving the data from dataloader in the same order I load the data into the data loader?

Many Many Thanks

oguiza commented 1 year ago

Hi @zmce2018 , The difference between the options you mention is that if you don't pass a mean and std variation, TSStandardize will calculate it based on the first random batch. If you want to calculate it yourself (for example using all training data) you then need to pass a mean and std. These 2 values may be different depending on the variability of your dataset, and the batch size. The dls.train always returns samples in random order while dls.valid return samples in the original order.

oguiza commented 1 year ago

Closed due to lack of response.