Closed tantrev closed 6 years ago
Also, as a side note: it doesn't seem that the current parser works with "feature_names" if they are specified in a DMatrix.
Nevermind, I figured the problem out. I had an elementary mistake with the y array - sorry about that.
I have figured out the mistake on y array (as already reported). I also found that the problem is numerical unstable, and setting the stabilization parameter slitighly larger such as eps=1e-5 would works well.
Sounds great, thank you! I'll try fiddling with the stabilization parameter.
Sorry to bother you again. I'm probably doing something wrong again, but I changed the numerical stabilization parameter as you suggested and am now getting some strange output with empty rules:
<< defragTrees >>
----- Evaluated Results -----
Test Error = 0.162679
Test Coverage = 1.000000
Overlap = 1.000000
----- Found Rules -----
[Rule 1]
y = 0 when
[Rule 2]
y = 0 when
x_22 < 4.835220
x_163 < 48.005800
x_171 < 30.000200
x_211 < 571.000000
x_223 < 581.966000
x_228 < 14.500000
x_250 < 73.063500
x_297 < 517.000000
x_304 < 1212.910000
x_336 < 1119.000000
x_337 < 5167.560000
[Rule 3]
y = 0 when
x_211 < 571.000000
x_337 >= 0.500000
x_342 >= 9.500000
x_347 >= 0.500000
[Rule 4]
y = 0 when
[Rule 5]
y = 0 when
[Rule 6]
y = 0 when
x_3 < 2.139350
x_342 >= 8.500000
[Rule 7]
y = 0 when
[Rule 8]
y = 0 when
[Rule 9]
y = 0 when
x_6 < 11.500000
x_96 < 10.385300
x_297 < 517.000000
x_342 >= 7.000000
[Rule 10]
y = 0 when
[Otherwise]
y = 0
Are empty rules normally expected? Perhaps I may just need to modify delta or kappa?
The latest example (with some pre-calculated trees), may be found here. Thank you again for all of your generous help.
To avoid empty rules, increasing kappa to, e.g., kappa=1e-3, will help. Empty rules can be obtained when the optimization has not converged perfectly. The tolerance parameter kappa is used to check whether the rule statements have converged appropriately. Increasing kappa will allow you to take the non-fully-converged statements into account.
I'm probably doing something stupid, but I was trying to use the xgboost functionality with a toy example of my own, and it resulted in the following error:
IndexError: too many indices for array
An example script and data files may be downloaded from here.
Any idea by chance what's going on?