anomaly-detection gpu-acceleration gpu-systems machine-learning outlier-detection unsupervised-learning

(Py)TOD: GPU-accelerated Outlier Detection via Tensor Operations

Background: Outlier detection (OD) is a key data mining task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.

We propose TOD, a system for efficient and scalable outlier detection (OD) on distributed multi-GPU machines. A key idea behind TOD is decomposing OD applications into basic tensor algebra operations for GPU acceleration.

Zhao, Y., Chen, G.H. and Jia, Z., 2021. TOD: GPU-accelerated Outlier Detection via Tensor Operations. arXiv preprint arXiv:2110.14007.

One Reason to Use It: ^^^^^^^^^^^^^^^^^^^^^

On average, TOD is 11 times faster than PyOD on a diverse group of OD algorithms!

If you need another reason: it can handle much larger datasets---more than a million sample OD within an hour!

GPU-accelerated Outlier Detection with 5 Lines of Code\ :

.. code-block:: python

# train the COPOD detector
from pytod.models.knn import KNN
clf = KNN() # default GPU device is used

# get outlier scores
y_train_scores = clf.decision_scores_  # raw outlier scores on the train data
y_test_scores = clf.decision_function(X_test)  # predict raw outlier scores on test

TOD is featured for:

Table of Contents\ :

Installation ^^^^^^^^^^^^

It is recommended to use pip for installation. Please make sure the latest version is installed, as PyTOD is updated frequently:

.. code-block:: bash

pip install pytod # normal install pip install --upgrade pytod # or update if needed

Alternatively, you could clone and run setup.py file:

.. code-block:: bash

git clone https://github.com/yzhao062/pytod.git cd pyod pip install .

Required Dependencies\ :

Implemented Algorithms ^^^^^^^^^^^^^^^^^^^^^^

PyTOD toolkit consists of three major functional groups (to be cleaned up):

(i) Individual Detection Algorithms :

=================== ================== ====================================================================================================== ===== ======================================== Type Abbr Algorithm Year Ref =================== ================== ====================================================================================================== ===== ======================================== Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 [#Shyu2003A] Proximity-Based LOF Local Outlier Factor 2000 [#Breunig2000LOF] Proximity-Based COF Connectivity-Based Outlier Factor 2002 [#Tang2002Enhancing] Proximity-Based HBOS Histogram-based Outlier Score 2012 [#Goldstein2012Histogram] Proximity-Based kNN k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score) 2000 [#Ramaswamy2000Efficient] Proximity-Based AvgKNN Average kNN (use the average distance to k nearest neighbors as the outlier score) 2002 [#Angiulli2002Fast] Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 [#Angiulli2002Fast] Probabilistic ABOD Angle-Based Outlier Detection 2008 [#Kriegel2008Angle] Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 [#Li2020COPOD] Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 [#Kriegel2008Angle] =================== ================== ====================================================================================================== ===== ========================================

Code is being released. Watch and star for the latest news!

A Motivating Example PyOD vs. PyTOD! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

kNN example <https://github.com/yzhao062/pytod/blob/main/examples/knn_example.py>_ shows that how fast and how easy PyTOD is. Take the famous kNN outlier detection as an example:

. Initialize a kNN detector, fit the model, and make the prediction.

.. code-block:: python

   from pytod.models.knn import KNN   # kNN detector

   # train kNN detector
   clf_name = 'KNN'
   clf = KNN()

.. code-block:: python

   # if GPU is not available, use CPU instead
   clf = KNN(device='cpu')

. Get the prediction results

.. code-block:: python

   # get the prediction label and outlier scores of the training data
   y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
   y_train_scores = clf.decision_scores_  # raw outlier scores

. On a simple laptop, let us see how fast it is in comparison to PyOD for 30,000 samples with 20 features

.. code-block:: python

  KNN-PyOD ROC:1.0, precision @ rank n:1.0
  Execution time 11.26 seconds

.. code-block:: python

  KNN-PyTOD-GPU ROC:1.0, precision @ rank n:1.0
  Execution time 2.82 seconds

.. code-block:: python

  KNN-PyTOD-CPU ROC:1.0, precision @ rank n:1.0
  Execution time 3.36 seconds

It is easy to see, PyTOD shows both better efficiency than PyOD.

Paper Reproducibility ^^^^^^^^^^^^^^^^^^^^^

Datasets: OD benchmark datasets are available at datasets folder <https://github.com/yzhao062/pytod/tree/main/reproducibility/datasets/ODDS>_.

Scripts for reproducibility is available in reproducibility folder <https://github.com/yzhao062/pytod/tree/main/reproducibility>_.

Cleanup is on the way!

Programming Model Interface ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Complex OD algorithms can be abstracted into common tensor operators.

.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png

For instance, ABOD and COPOD can be assembled by the basic tensor operators.

.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png

End-to-end Performance Comparison with PyOD ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Overall, it is much (on avg. 11 times) faster than PyOD takes way less run time.

.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png

Reference ^^^^^^^^^

