sato9hara / defragTrees

Python code for tree ensemble interpretation
MIT License
83 stars 22 forks source link

xgboost trees - IndexError #2

Closed tantrev closed 6 years ago

tantrev commented 6 years ago

I'm probably doing something stupid, but I was trying to use the xgboost functionality with a toy example of my own, and it resulted in the following error:

IndexError: too many indices for array

An example script and data files may be downloaded from here.

Any idea by chance what's going on?

tantrev commented 6 years ago

Also, as a side note: it doesn't seem that the current parser works with "feature_names" if they are specified in a DMatrix.

tantrev commented 6 years ago

Nevermind, I figured the problem out. I had an elementary mistake with the y array - sorry about that.

sato9hara commented 6 years ago

I have figured out the mistake on y array (as already reported). I also found that the problem is numerical unstable, and setting the stabilization parameter slitighly larger such as eps=1e-5 would works well.

tantrev commented 6 years ago

Sounds great, thank you! I'll try fiddling with the stabilization parameter.

tantrev commented 6 years ago

Sorry to bother you again. I'm probably doing something wrong again, but I changed the numerical stabilization parameter as you suggested and am now getting some strange output with empty rules:

<< defragTrees >>
----- Evaluated Results -----
Test Error = 0.162679
Test Coverage = 1.000000
Overlap = 1.000000

----- Found Rules -----
[Rule  1]
y = 0 when

[Rule  2]
y = 0 when
     x_22 < 4.835220
     x_163 < 48.005800
     x_171 < 30.000200
     x_211 < 571.000000
     x_223 < 581.966000
     x_228 < 14.500000
     x_250 < 73.063500
     x_297 < 517.000000
     x_304 < 1212.910000
     x_336 < 1119.000000
     x_337 < 5167.560000

[Rule  3]
y = 0 when
     x_211 < 571.000000
     x_337 >= 0.500000
     x_342 >= 9.500000
     x_347 >= 0.500000

[Rule  4]
y = 0 when

[Rule  5]
y = 0 when

[Rule  6]
y = 0 when
     x_3 < 2.139350
     x_342 >= 8.500000

[Rule  7]
y = 0 when

[Rule  8]
y = 0 when

[Rule  9]
y = 0 when
     x_6 < 11.500000
     x_96 < 10.385300
     x_297 < 517.000000
     x_342 >= 7.000000

[Rule 10]
y = 0 when

[Otherwise]
y = 0

Are empty rules normally expected? Perhaps I may just need to modify delta or kappa?

The latest example (with some pre-calculated trees), may be found here. Thank you again for all of your generous help.

sato9hara commented 6 years ago

To avoid empty rules, increasing kappa to, e.g., kappa=1e-3, will help. Empty rules can be obtained when the optimization has not converged perfectly. The tolerance parameter kappa is used to check whether the rule statements have converged appropriately. Increasing kappa will allow you to take the non-fully-converged statements into account.