henryzord / ardennes

An Estimation of Distribution Algorithm for Decision-Tree Induction.
5 stars 1 forks source link
classification decision-trees eda estimation-distribution-algorithm evolutionary-algorithm evolutionary-computation genetic-algorithm induction optimization optimization-algorithms

Ardennes

Ardennes is an Estimation of Distribution Algorithm for performing decision-tree induction, as presented in the paper

CAGNINI, Henry E. L; BARROS, R. C; BASGALUPP, M. P. Estimation of Distribution Algorithms for Decision-Tree Induction. IEEE Congress on Evolutionary Computation (IEEE CEC 2017), San Sebastián, Spain, June 5-8, 2017.

Citation

If you find this code useful in your work, please cite it:

@inproceedings{cagnini2017ardennes,
  author    = {Henry E. L. Cagnini and
               Rodrigo C. Barros and
               M\'{a}rcio P. Basgalupp},
  title     = {{Estimation of Distribution Algorithms for Decision-Tree Induction}},
  booktitle = {{IEEE} Congress on Evolutionary Computation, {CEC} 2017, San Sebastián, Spain, June 5-8, 2017},
  year      = {2017}
}

Capabilities

Limitations

This algorithm will only work:

Installation

Essential:

pip install networkx liac-arff numpy scikit-learn pandas scipy

For plotting trees and interpreting graphical models:

sudo apt-get install graphviz libgraphviz-dev pkg-config
pip install pygraphviz matplotlib plotly

For running j48 inside python:

sudo apt-get install default-jre default-jdk
pip install additional_packages/python-weka-wrapper-0.3.9.tar.gz

For parallel processing - greatly increases performance:

sudo apt-get install libffi-dev g++
sudo apt-get install ocl-icd-opencl-dev
pip install mako

Then follow instructions from https://wiki.tiker.net/PyOpenCL/Installation/Linux, or optionally:

NOTICE: If you use a virtual environment, you must activate it before running the following commands.

tar xfz additional_packages/pyopencl-2016.2.1.tar.gz
cd pyopencl-2016.2.1
python configure.py
sudo su -c "make install"

And you're done!

First steps

Your starting point should be by taking a look at the code located at the main.py script. Once you figure out what it does (it is fairly simple to understand), you can call it from terminal:

python main.py

The expected output should be something like this:

NOTICE: Using single-threaded CPU as device.
training ardennes for dataset liver-disorders
iter: 000 mean: 0.690761 median: 0.688406 max: 0.818841 ET: 22.49sec  height:  9  n_nodes: 45  test acc: 0.536232
iter: 001 mean: 0.674928 median: 0.692029 max: 0.818841 ET:  4.09sec  height:  9  n_nodes: 45  test acc: 0.536232
...
iter: 099 mean: 0.730978 median: 0.789855 max: 0.818841 ET: 2.68sec  height:  9  n_nodes: 27  test acc: 0.637681
Test acc: 0.64 Height: 9 n_nodes: 27 Time: 342.39 secs

Structure of the code