Closed pedroilidio closed 2 years ago
Maybe that is normal behavior. Since features are drawn randomly (lines below), if more than one split generates the same impurity improvement, the order in which they are placed in the tree is random.
But should equally-evaluated splits be that common? A simple case I have thought of is when the node is a square, with positive data restricted to a corner. There would be equivalent splits between the 2 axes.
If the tree is fully built, the mentioned case will of course be common. Take, for instance, the submatrix:
10
00
I rest my case.
Running and inspecting trees, for some parameter combinations such as
python test_nd_classes.py --seed 23 --noise .1 --nrules 20 --shape 500 600 --nattrs 10 9 --msl 100 --inspect
shows that
hypertree.tree.DecisionTreeRegressor2D
yields a slightly different tree in comparison tosklearn.tree.DecisionTreeRegressor
, with some sister leaves appearing to be swapped (left-right), while they are theoretically expected to be identical. More careful evaluation is needed.