(automatic) time series data preprocessing

Data quality can significantly impact the results of the forecasting.

Currently we assume that input data is uniformly sampled in timeline and all missing values are filled. However in realworld we found many time series are not as cleaned as assumed. There's still gap in this part of processing, e.g. uniform sampling, fill missing values, dealing with outliers, etc.

There're several directions in this part of work.

outlier removal. For anomly detection sometimes we have to detect outliers in training data so that such noise won't be taken as normal and impact forecast results. This part we can use an autencoder to do this.
filling na & imputation
- traditional methods, e.g average fill, forward/backword fill, interpolation, etc.
- imputation using GAN https://papers.nips.cc/paper/7432-multivariate-time-series-imputation-with-generative-adversarial-networks

intel-analytics / analytics-zoo

(automatic) time series data preprocessing #793