lin-shuyu / VAE-LSTM-for-anomaly-detection

We propose a VAE-LSTM model as an unsupervised learning approach for anomaly detection in time series.
445 stars 85 forks source link

Can you share the code for preprocessing and explain meaning of each index of data? #3

Closed NoPainNoCode closed 4 years ago

NoPainNoCode commented 4 years ago
  1. Can you share the code used to preprocess the npz files in the dataset folder?
  2. And can you explain in detail the meaning of each index of data [below]?

[below] data = np.load('./machine_temp.npz', mmap_mode='r', allow_pickle=True) for i, k in enumerate(data.files): print("i:{}, k:{}".format(i, k)) ==========result========== i:0, k:t i:1, k:t_unit i:2, k:readings i:3, k:idx_anomaly i:4, k:idx_split i:5, k:training i:6, k:test i:7, k:train_m i:8, k:train_std i:9, k:t_train i:10, k:t_test i:11, k:idx_anomaly_test

lin-shuyu commented 4 years ago

Hi NoPainNoCode,

Thanks for your question!

I've added a demo ipython notebook in datasets/ folder. Please have a look there for the detailed pre-processing procedure. In summary, we only standardised the time series by removing the mean and normalising by the standard deviation of the original time series.

As for the meaning of the specific features in the loaded data, I will list the explanation below:

  1. t - timestamp for each reading in the time series.
  2. t_unit - unit for the interval between two consecutive timestamps.
  3. readings - the original time series values; same as the time series loaded from the original .csv file.
  4. idx_anomaly - indices where the anomalies occurred; computed from the anomaly timestamps from the original .csv file.
  5. idx_split - indices between which the training set is created. We took a section of the original time series where no anomalies have occurred as the training set.
  6. training - normalised time series for the training set.
  7. test - normalised time series for the test set.
  8. t_train - indices for the training set readings.
  9. t_test - indices for the test set readings.
  10. idx_anomaly_test - indices for the anomalies in the test set.

Hope this explanation is helpful for you!

Best wishes, Lin