Systematic Evaluation of CASH Search Strategies for Unsupervised Anomaly Detection

Repository for the corresponding full-paper accepted at the LIDTA-2022 workshop of the ECML/PKDD 2022.

Note: This repository initially served as the code repo for my thesis in Master of Artificial Intelligence programme at KU Leuven but it was later modified/extended to accommodate the relevant content of the LIDTA-2022 full-paper submission.

Description

The code provides an experimental evaluation of how the structure of the validation set, i.e., its size and label bias, impacts the performance of different CASH search strategies within the context of anomaly detection.

data directory contains a sub-directory of the original datasets used in the experiments, while the processed sub-directory is created by src/notebooks/dataset_preprocessor.ipynb notebook.
src directory contains the core implemntation code comprised of python scripts and notebooks. It also contains the auto-sklearn package which is modified to accommodate unsupervised anomaly detection tasks.
results directory contains the raw results of the paper for the different CASH search spaces.

How to run the code

Provide the experiment parameters in src/config.json:

datasets: list of datasets
iterations: list of iterations, i.e. different versions of the train/test splits (1, 2, ..., 10)
classifiers: list of anomaly detectors
search_space: version of search space (sp1, sp2 or default)
validation_set_split_strategies: list of strategies to split the validation set (stratified, balanced)
validation_set_sizes: list of sizes for the validation set (20, 50, 100, 200)
total_budget: total duration of a single search
per_run_budget: minimum duration of a single run and run auto_ad_main.py.

External links

Name	Description	Link
Auto-Sklearn	Automated machine learning toolkit	:link:
PyOD	Python library for anomaly detection	:link:
Datasets	Anomaly detection datasets	:link:

johnantonn / cash-for-unsupervised-ad

readme

Systematic Evaluation of CASH Search Strategies for Unsupervised Anomaly Detection

Description

Contents

How to run the code

External links

License