zhhlee / InterFusion

KDD 2021: Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding
MIT License
195 stars 46 forks source link

Need help on reproducing the f1-scores on SWAT dataset #4

Closed yongmei5 closed 2 years ago

yongmei5 commented 2 years ago

Hi @zhhlee, thank you so much for this codebase, and your earlier help in making the code working on the SWAT dataset. Currently, I am trying to reproduce the results from your paper for the SWAT dataset, but I am facing some problems. Hope you can help!

I have run the training and prediction twice (where prediction has --mcmc_track=False') with the given flags provided by you for the SWAT dataset. The first time the best f1-score is 0.844. The second time the best f1-score is 0.864. In the paper, you mentioned that the best f1-score you achieved is 0.928. I am currently also trying to run prediction with mcmc_tracker=True. But it would take around two days for me to get results, where I can get results in 8 hours when mcmc_tracker=False.

Can you please help to check whether the flags or hyperparameters are set correctly in your GitHub code??? Many thanks for your help and awesome work!!!

I have done the following for the data processing: 1, Download SWAT datasets. 2, Using xlsx2csv to convert SWaT_Dataset_Attack_v0.xlsx to SWAT_Dataset_Attack_v0.csv. Same for SWAT_Dataset_Normal_v0.xlsx. 3, Use explib/raw_data_converter to convert the respective csv files to pkl files.

I use the following command for training as you suggested: python stack_train.py --dataset=SWaT --train.train_start=21600 --train.valid_portion=0.1 --model.window_length=30 '--model.output_shape=[15, 15, 30]' --model.z2_dim=8 --output-dir=/tmp/output/interfusion/SWAT/train_1

I use the following command for prediction as you suggested: python stack_predict.py --load_model_dir=/tmp/output/interfusion/SWAT/train_1 --output-dir=/tmp/output/interfusion/SWAT/pred_1 --mcmc_track=False

zhhlee commented 2 years ago

@yongmei5

  1. It seems that the code are all right, for SWaT dataset, some hyperparameters are set in ExpConfig (monitored by mltk package). You may check the config.json file generated in output_dir to see if they are set correctly.
  2. The model training may be not very stable on some of the datasets, mainly due to the training of the flexible RNVP posterior. This may lead to a slight performance degradation when the model is not well trained sometimes.
  3. For SWaT dataset, we use mcmc_tracker=False and use_mcmc=True for testing. By the way, to reduce the testing time, you may also set plot_recons_results=False and save_results=False when testing.
  4. To help you reproduce the results, we put our model for SWaT at https://1drv.ms/f/s!AsTNHlSUTQHXg3s-uVG9HQ23BO28, which achieves 0.928 best f1-score (with around 0.003 fluctuation due to the stochasticity in MCMC imputation in inference phase).