Open vruusmann opened 2 months ago
This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix. This sometimes makes certain projection rows just either all 0's, or only one +1/-1. In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.
I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?
In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.
You would standardize the projection matrix only once. Basically, you'd iterate over PM row-wise, and if the "effective row length" is one (ie. contains only one non-zero element), you'd set this one element to +1.0
. No need to even check for its actual value.
I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?
I tried to "invert" these negative splits during PMML conversion.
Something like: If input is -1 * feature <= -1 * threshold
, then interpret it as feature > threshold
. But my integration testing showed that this inversion is not sufficient, because the predictions (between original and inverted splits) came out different. Looks like the threshold value itself also needs to be adjusted somehow.
This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix.
But would it be possible to add some training parameter, which allows the data scientist to indicate if she's willing to accept a slight performance penalty during model training, in order to get much simplified oblique trees for later prediction and interpretation?
I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?
Right now, when you train a simplistic oblique decision tree classifier for the iris dataset, then you get two types of splits per feature. For example, Sepal.Length <= 5.3
and -1 * Sepal.Length <= -6.5
. Essentially, the "effective number" of features is doubled.
This makes the interpretation of oblique forests twice as hard as it could be.
@jovo any thoughts on how to best handle this?
Is your feature request related to a problem? Please describe.
While developing a PMML converter for oblique trees (see #255), I noticed that the projection matrix (as retrievable via the
ObliqueTree.proj_vecs
attribute) contains two types of "axis aligned split" definitions (ie. projection matrix rows where only a single row element is set to a non-zero value).These two types are:
1.0
. For example,[0, 0, 1, 0]
.-1.0
. For example,[0, -1, 0, 0]
.Describe the solution you'd like
I would propose that all axis aligned splits should be standardized to the positive/default axis aligned split representation.
Negating a split condition does not add any information to it. But it makes interpreting the resulting oblique tree more complicated, because the associated split threshold value also appears negated.
feature <= threshold
-1 * feature <= -1 * threshold
In other words, the algorithm should not multiply standalone feature values with
-1
during training. It should keep them as-is.Describe alternatives you've considered
The current behaviour (SkTree 0.7.2) is okay, but the resulting oblique trees are unnecessarily complicated.