thuml / Time-Series-Library

A Library for Advanced Deep Time Series Models.
MIT License
7.25k stars 1.15k forks source link

Practical time series problems in industry #236

Closed jexterliangsufe closed 1 year ago

jexterliangsufe commented 1 year ago

My dataset contains 100,000+ nodes and each node contains 365 time steps. the missing rate of this dataset is > 0.5. At the mean time, missing is random. Missing is mainly because my target is to forecast a "rate" calculated by A / B and sometimes B = 0. My target is to forecast several future time steps of "rate". I have tried Temporal Fusion Transformer, DLinear and gnn-based spatio-temporal models but achieved so-so results. I would appreciate if you give me some suggestions on dataset preprocessing, missing value inputation and model selection?

wuhaixu2016 commented 1 year ago

Hi, thanks for introducing us this challenge problem. This problem involves both imputation and forecasting tasks.

  1. Imputation (1) I think you can try TimesNet (https://github.com/thuml/Time-Series-Library/blob/main/models/TimesNet.py), which is the current SOTA model for time series imputation. (2) Considering the missing rate of this dataset is relative high, I think you can adopt the mean value to fill the missing points at the very beginning and then try the TimesNet for imputation by mask training. (3) If you want to further enhance the imputation performance, you can adopt the TimesNet several times. Each time is to train a new model to correct the last time imputation results.

  2. Forecasting I think both TimesNet and iTransformer are promising.

Good luck to your project.

jexterliangsufe commented 1 year ago

Thanks for your kind reply! Here are 2 questions.

  1. What I really want to do is to forecast but not imputation. I have tried some imputation methods like forward/backward fill and the mean value fill as you said. Will the model-based methods like TimesNet Imputation perform better? I am worry about using model-based methods will bring additional uncertainty. If the inputation model is trained bad, it will harm forward forecasting model. Are there any papers or methods talking about data preprocessing especially high missing rate data?
  2. My dataset looks different from common datasets in your papers or other popular TSF papers. So I tried gnn-based spatio-temporal models at the begining but achieved bad results. I am worry about the dataset have too many nodes but too few time steps. Do you have the same feeling about it?