Closed shane-huang closed 4 years ago
Thanks for the question. For the electricity and traffic datasets, we follow the processed version of the datasets from the TRMF paper: https://www.cs.utexas.edu/~rofuyu/papers/tr-mf-nips.pdf
You can look at appendix A of the above paper to get an idea of the preprocessing used. In general if you represent a multi-variate time-series datasets as a n*t matrix with rows as different time-series and columns as time-points you should be able to use the code. Please set the "freq" parameter properly to reflect whether the data is daily or hourly etc.
Thanks for the question. For the electricity and traffic datasets, we follow the processed version of the datasets from the TRMF paper: https://www.cs.utexas.edu/~rofuyu/papers/tr-mf-nips.pdf
You can look at appendix A of the above paper to get an idea of the preprocessing used. In general if you represent a multi-variate time-series datasets as a n*t matrix with rows as different time-series and columns as time-points you should be able to use the code. Please set the "freq" parameter properly to reflect whether the data is daily or hourly etc.
Thanks so much for the guide. :) Again, it's really a great job.
Hey guys, really impressive work and thanks for sharing the code.
We're trying to use DeepGLO to process datasets other than the four used in the paper, and kind of got stuck at the preprocessing stage. It would be great if you could share any specification or scripts about how to properly preprocess the raw data from the public datasets used in the paper.
It seems there's much difference between the original data and processed data (eg. electricy.npy, etc.). For example, I have downloaded raw electricity data from https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014, and did resample and fillna as follows.
The last 10 data points of the 1st series, i.e. "MT_001" in the original dataset looks below:
On the other hand, the last 10 datapoints of the 1st series in the "electricity.npy" looks like below. Apparently the values are much different from the original time series values.
Maybe I've missed something here... It would be really helpful if you could share how this electricity.npy is processed from the raw data as above.