neurodata / scikit-tree

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://docs.neurodata.io/scikit-tree/dev/index.html
Other
54 stars 13 forks source link

Simplifying projection matrix rows #262

Open vruusmann opened 2 months ago

vruusmann commented 2 months ago

Is your feature request related to a problem? Please describe.

While developing a PMML converter for oblique trees (see #255), I noticed that the projection matrix (as retrievable via the ObliqueTree.proj_vecs attribute) contains two types of "axis aligned split" definitions (ie. projection matrix rows where only a single row element is set to a non-zero value).

These two types are:

Describe the solution you'd like

I would propose that all axis aligned splits should be standardized to the positive/default axis aligned split representation.

Negating a split condition does not add any information to it. But it makes interpreting the resulting oblique tree more complicated, because the associated split threshold value also appears negated.

In other words, the algorithm should not multiply standalone feature values with -1 during training. It should keep them as-is.

Describe alternatives you've considered

The current behaviour (SkTree 0.7.2) is okay, but the resulting oblique trees are unnecessarily complicated.

adam2392 commented 2 months ago

This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix. This sometimes makes certain projection rows just either all 0's, or only one +1/-1. In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

vruusmann commented 2 months ago

In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.

You would standardize the projection matrix only once. Basically, you'd iterate over PM row-wise, and if the "effective row length" is one (ie. contains only one non-zero element), you'd set this one element to +1.0. No need to even check for its actual value.

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

I tried to "invert" these negative splits during PMML conversion.

Something like: If input is -1 * feature <= -1 * threshold, then interpret it as feature > threshold. But my integration testing showed that this inversion is not sufficient, because the predictions (between original and inverted splits) came out different. Looks like the threshold value itself also needs to be adjusted somehow.

vruusmann commented 2 months ago

This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix.

But would it be possible to add some training parameter, which allows the data scientist to indicate if she's willing to accept a slight performance penalty during model training, in order to get much simplified oblique trees for later prediction and interpretation?

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

Right now, when you train a simplistic oblique decision tree classifier for the iris dataset, then you get two types of splits per feature. For example, Sepal.Length <= 5.3 and -1 * Sepal.Length <= -6.5. Essentially, the "effective number" of features is doubled.

This makes the interpretation of oblique forests twice as hard as it could be.

adam2392 commented 1 month ago

@jovo any thoughts on how to best handle this?