Imbalanced data stream classification with the changing prior probabilities
Implementation
Setup - experiment real data
Comparison between HDWE method and selected state-of-the-art method with base classifiers SVC and HDDT.
Data
Number of data streams: 3
- Poker data set
- Cover type data set
- Insects data set
Methods
- HDWE
- SEA
- AWE
- Learn++.CDS
- Learn++.NIE
- OUSE
- REA
Setup - experiment 3a
Comparison between HDWE method and selected state-of-the-art method with base classifiers SVC.
Data
In general, numbers of data streams: 84
-
Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567
Methods
- HDWE
- SEA
- AWE
- Learn++.CDS
- Learn++.NIE
- OUSE
- REA
Setup - experiment 3b
Comparison between HDWE method and selected state-of-the-art method with base classifiers HDDT.
Data
In general, numbers of data streams: 84
-
Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567
Methods
- HDWE
- SEA
- AWE
- Learn++.CDS
- Learn++.NIE
- OUSE
- REA
Setup - experiment 2
Different base classifier in the HDWE (Hellinger Distance Weighted Ensemble) method. Check influence of the type of base classifier on the ensemble.
Data
In general, numbers of data streams: 84
-
Generated from stream-learn
- Number of class: 2
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Dynamically imbalance ratio: 1%, 3%, 5%, 10%, 15%, 20%, 25%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
- Random state: 1111, 1234, 1567
Methods
- HDWE(GNB)
- HDWE(MLP)
- HDWE(CART)
- HDWE(HDDT)
- HDWE(KNN)
- HDWE(SVC)
Evaluation
- Test Then Train
Metrics
- Specificity
- Recall
- Precision
- F1-score
- Balanced accuracy score
- Geometric-mean
Setup - experiment 1
Compare 2 methods HDWE and AWE with different imbalance ratio.
Data
-
Generated from stream-learn
- Number of class: 2
- Number of streams: 10 (random state has changed in range 1000 - 1550)
- Number of concept drifts: 5
- Types of concept drifts: sudden, incremental
- Stationary imbalance ratio: 10%, 20%, 30%, 40%, 50%
- Dynamically imbalance ratio: 10%, 20%, 30%, 40%
- Number of samples: 10000 (where number of chunks: 200 and chunk size: 500)
- Number of features: 20 (where informative: 15 and redundant: 5)
Methods
- HDWE (Hellinger Distance Weighted Ensemble) - own contribution
- HDDT (Hellinger Distance Decision Tree) - own implementation
- AWE (Accuracy-Weighted Ensemble)
- Gaussian Naive Bayes - base classifier
Evaluation
- Test Then Train
Metrics
- Balanced accuracy score
- F1 score
- Recall
- Specificity