Ardennes is an Estimation of Distribution Algorithm for performing decision-tree induction, as presented in the paper
CAGNINI, Henry E. L; BARROS, R. C; BASGALUPP, M. P. Estimation of Distribution Algorithms for Decision-Tree Induction. IEEE Congress on Evolutionary Computation (IEEE CEC 2017), San Sebastián, Spain, June 5-8, 2017.
If you find this code useful in your work, please cite it:
@inproceedings{cagnini2017ardennes,
author = {Henry E. L. Cagnini and
Rodrigo C. Barros and
M\'{a}rcio P. Basgalupp},
title = {{Estimation of Distribution Algorithms for Decision-Tree Induction}},
booktitle = {{IEEE} Congress on Evolutionary Computation, {CEC} 2017, San Sebastián, Spain, June 5-8, 2017},
year = {2017}
}
This algorithm will only work:
Essential:
pip install networkx liac-arff numpy scikit-learn pandas scipy
For plotting trees and interpreting graphical models:
sudo apt-get install graphviz libgraphviz-dev pkg-config
pip install pygraphviz matplotlib plotly
For running j48 inside python:
sudo apt-get install default-jre default-jdk
pip install additional_packages/python-weka-wrapper-0.3.9.tar.gz
For parallel processing - greatly increases performance:
sudo apt-get install libffi-dev g++
sudo apt-get install ocl-icd-opencl-dev
pip install mako
Then follow instructions from https://wiki.tiker.net/PyOpenCL/Installation/Linux, or optionally:
NOTICE: If you use a virtual environment, you must activate it before running the following commands.
tar xfz additional_packages/pyopencl-2016.2.1.tar.gz
cd pyopencl-2016.2.1
python configure.py
sudo su -c "make install"
And you're done!
Your starting point should be by taking a look at the code located at the main.py
script. Once you figure out what it does (it is fairly simple to understand), you can call it from terminal:
python main.py
The expected output should be something like this:
NOTICE: Using single-threaded CPU as device.
training ardennes for dataset liver-disorders
iter: 000 mean: 0.690761 median: 0.688406 max: 0.818841 ET: 22.49sec height: 9 n_nodes: 45 test acc: 0.536232
iter: 001 mean: 0.674928 median: 0.692029 max: 0.818841 ET: 4.09sec height: 9 n_nodes: 45 test acc: 0.536232
...
iter: 099 mean: 0.730978 median: 0.789855 max: 0.818841 ET: 2.68sec height: 9 n_nodes: 27 test acc: 0.637681
Test acc: 0.64 Height: 9 n_nodes: 27 Time: 342.39 secs
config.json
: Where you will input the algorithm parameters, such as number of individuals, number of iterations, decile and maximum tree height.main.py
: starting point for running the algorithm.evaluate.py
: the module which is called from main.py
. It has several functions which perform holdout, cross-validation and such operations.treelib
: directory for the main Ardennes code.