Python code for tree ensemble interpretation proposed in the following paper.
To use defragTrees:
To run example codes in example
directory:
To replicate paper results in paper
directory:
Prepare data:
X
: feature matrix, numpy array of size (num, dim).y
: output array, numpy array of size (num,).
y
is real value.y
is class index (i.e., 0, 1, 2, ..., C-1, for C classes).splitter
: thresholds of tree ensembles, numpy array of size (# of split rules, 2).
splitter
is (feature index, threshold). Suppose the split rule is second feature < 0.5
, the row of splitter
is then (1, 0.5).Import the class:
from defragTrees import DefragModel
Fit the simplified model:
Kmax = 10 # uppder-bound number of rules to be fitted
mdl = DefragModel(modeltype='regression') # change to 'classification' if necessary.
mdl.fit(X, y, splitter, Kmax)
#mdl.fit(X, y, splitter, Kmax, fittype='EM') # use this when one wants exactly Kmax rules to be fitted
Check the learned rules:
print(mdl)
For further deitals, see defragTrees.py
.
In IPython, one can check:
import defragTrees
defragTrees?
See example
directory.
See paper
directory.