linkedin / isolation-forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm with support for exporting in ONNX format.
Other
223 stars 47 forks source link

Added the numFeatures parameter to the IsolationForestModel class (incl. saved model metadata). #31

Closed jverbus closed 2 years ago

jverbus commented 2 years ago

numFeatures is the user-specified number of features used to train each isolation tree.

For certain edge cases, a given isolation tree may not have any nodes using some of these features, e.g., a shallow tree where the number of features in the training data exceeds the number of nodes in the tree.

The numFeatures parameter was added, so that we can include this value in the model metadata when saving the model. This value is needed to convert a saved model file to ONNX.

The model version is bumped to 3.0.* from 2.0.*, because this is a breaking change. Models saved using <3.0.0 cannot be loaded using 3.0.*. It is easy to manually modify the saved model metadata file to fix this for existing models as is shown in this PR.