Please refer to the README.md
files in each folder for the details of the reproduction procedures and the evaluation scripts.
The datasets are available at:
https://github.com/hjhhsy120/DBPA_dataset
The datasets are provided in another repository.
The reproduction procedures of single anomalies are listed as follows.
Script: small_shared_buffer.sh
Correctness:
ini_knob_values
as 25~40% of available memory as the normal. shared_buffers
: modify the knob_values
Reproduction command: bash small_shared_buffer.sh >> small_shared_buffer.txt
Script: io_saturation.sh
Correctness
-i
in io_saturation_server.sh
Reproduction command: bash io_saturation.sh >> io_saturation.txt
Script: concurrent_inserts.sh
Correctness: Check whether the latency of injected queries are >1.5 times longer than single-thread inserts
Reproduction: bash concurrent_inserts.sh >> concurrent_inserts.txt
Script: concurrent_commits.sh
Correctness: check whether the result set is not empty: select * from pg_stat_activity where wait_event = ‘WALWriteLock’ and state <> ‘idle’;
Reproduction: bash concurrent_commits.sh >> concurrent_commits.txt
Script: heavy_workload.sh
Correctness: check whether the average latency of OLTPBench is >1.5 times longer than normal.
Reproduction: bash heavy_workload.sh >> heavy_workload.txt
Script: missing_indexes_and_vacuum.sh
Correctness:
Reproduction: bash missing_indexes_and_vacuum.sh >> missing_indexes_and_vacuum.txt
too_many_indexes.sh
bash too_many_indexes.sh >> too_many_indexes.txt
Script: lock_waits.sh
Correctness: check whether there are lock wait events.
Reproduction: bash lock_waits.sh >> lock_waits.txt
reproduction/compound/generation/generation.py
gen_data
function, where the parameter data
is a list of samples. Each sample contains two vectors for the anomalies and one vector for the normal background. The vector consists of the monitoring metrics of one timestamp. The parameter same_type
indicates whether the anomalies are rooted in the same factor, i.e., the environment, the workload amount, and the queries.dataset.py
norm.py
python detect.py --data_file data_use_[4/6/8]_norm --model_name_list IsolationForest,OneClassSVM,LocalOutlierFactor,SVDD
python diagnosis.py --train_file data_use_[4/6/8]_norm --model_name_list Linear,MLP,DecisionTree,RandomForest,XGBoost,LightGBM
python automonitor_weighted.py --test_size [0.2/0.4/0.6]
Shiyue Huang, Ziwei Wang, Xinyi Zhang, Yaofeng Tu, zhongliang Li, and Bin Cui:
"DBPA: A Benchmark for Transactional Database Performance Anomalies"
SIGMOD 2023