Imbalanced data stream classification with the changing prior probabilities

Implementation

Setup - experiment real data

Comparison between HDWE method and selected state-of-the-art method with base classifiers SVC and HDDT.

Data

Number of data streams: 3

Poker data set
Cover type data set
Insects data set

Methods

HDWE
SEA
AWE
Learn++.CDS
Learn++.NIE
OUSE
REA

Setup - experiment 3a

Comparison between HDWE method and selected state-of-the-art method with base classifiers SVC.

Data

In general, numbers of data streams: 84

Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567

Methods

HDWE
SEA
AWE
Learn++.CDS
Learn++.NIE
OUSE
REA

Setup - experiment 3b

Comparison between HDWE method and selected state-of-the-art method with base classifiers HDDT.

Data

In general, numbers of data streams: 84

Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567

Methods

HDWE
SEA
AWE
Learn++.CDS
Learn++.NIE
OUSE
REA

Setup - experiment 2

Different base classifier in the HDWE (Hellinger Distance Weighted Ensemble) method. Check influence of the type of base classifier on the ensemble.

Data

In general, numbers of data streams: 84

Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567

Methods

HDWE(GNB)
HDWE(MLP)
HDWE(CART)
HDWE(HDDT)
HDWE(KNN)
HDWE(SVC)

Evaluation

Test Then Train

Metrics

Specificity
Recall
Precision
F1-score
Balanced accuracy score
Geometric-mean

Setup - experiment 1

Compare 2 methods HDWE and AWE with different imbalance ratio.

Data

Generated from stream-learn
- Number of class: 2
- Number of streams: 10 (random state has changed in range 1000 - 1550)
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 10%, 20%, 30%, 40%, 50%
- Dynamically imbalance ratio: 10%, 20%, 30%, 40%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)

Methods

HDWE (Hellinger Distance Weighted Ensemble) - own contribution
HDDT (Hellinger Distance Decision Tree) - own implementation
AWE (Accuracy-Weighted Ensemble)
Gaussian Naive Bayes - base classifier

Evaluation

Test Then Train

Metrics

Balanced accuracy score
F1 score
Recall
Specificity

joannagrzyb / HDWE

readme

Imbalanced data stream classification with the changing prior probabilities

Setup - experiment real data

Data

Methods

Setup - experiment 3a

Data

Methods

Setup - experiment 3b

Data

Methods

Setup - experiment 2

Data

Methods

Evaluation

Metrics

Setup - experiment 1

Data

Methods

Evaluation

Metrics