Open tzemicheal opened 3 years ago
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
With https://github.com/dmlc/treelite/pull/322, it might be possible to support isolation forests in FIL.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
The remaining piece is to support additional transformation mode in FIL: apply f(x) = exp2(-x / c)
to predicted scores, where c
is provided by the Treelite model.
Hi @hcho3
I tested using FIL usingForestInference.load_from_sklearn
for random forest with plan to test iForest trained from sklearn by loading same load function. It looks FIL is producing error for randomForest model trained using sklearn. Could this fix help iForest inference in FIL? Here is the detail error
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)
model = treelite.sklearn.import_model(clf)
# save model
model.export_lib(toolchain="gcc", libpath='rf_forest.so', verbose=True)
fil_model = ForestInference.load_from_sklearn(
skl_model="rf_forest.so",
algo='BATCH_TREE_REORG',
output_class=False,
threshold=0.50
)
TreeliteError Traceback (most recent call last)
<ipython-input-56-9fd5f432c56d> in <module>
3 algo='BATCH_TREE_REORG',
4 output_class=False,
----> 5 threshold=0.50
6 )
cuml/fil/fil.pyx in cuml.fil.fil.ForestInference.load_from_sklearn()
/opt/conda/envs/rapids/lib/python3.7/site-packages/treelite/sklearn/importer.py in import_model(sklearn_model)
126 leaf_value_expected_shape = lambda node_count: (node_count, 1, sklearn_model.n_classes_)
127 else:
--> 128 raise TreeliteError(f'Not supported model type: {sklearn_model.__class__.__name__}')
129
130 if isinstance(sklearn_model,
TreeliteError: Not supported model type: str
We haven't gotten around for adding support for isolation forest in FIL. So the error is expected.
Update: the experimental version of FIL is now compatible with IsolationForest.
import numpy as np
import treelite
from sklearn.ensemble import IsolationForest
from cuml.experimental import ForestInference
n_samples, n_outliers = 120, 40
rng = np.random.RandomState(0)
covariance = np.array([[0.5, -0.1], [0.7, 0.4]])
cluster_1 = 0.4 * rng.randn(n_samples, 2) @ covariance + np.array([2, 2]) # general
cluster_2 = 0.3 * rng.randn(n_samples, 2) + np.array([-2, -2]) # spherical
outliers = rng.uniform(low=-4, high=4, size=(n_outliers, 2))
X = np.concatenate([cluster_1, cluster_2, outliers]).astype("float32")
y = np.concatenate(
[np.ones((2 * n_samples), dtype=int), -np.ones((n_outliers), dtype=int)]
)
clf = IsolationForest(max_samples=100, random_state=0)
clf.fit(X)
expected_pred = -clf.score_samples(X).reshape((-1, 1))
fm = ForestInference.load_from_sklearn(clf, output_class=False)
out_pred = fm.predict(X)
np.testing.assert_almost_equal(out_pred, expected_pred, decimal=3)
Note that currently FIL matches the output of score_samples
, not decision_function
.
Is your feature request related to a problem? Please describe. An implementation to IsolationForest (unsupervised tree based anomaly detection). The scikit has the following implementation https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html. The implementation of isolationForest could be extended from existing random forest/decision tree algorithm in rapids and could take advantage of the fast inference.
Describe the solution you'd like This is also related to earlier feature request for extraTreeRegression https://github.com/rapidsai/cuml/issues/3063
Additional context It is one of widely used unsupervised anomaly detection algorithms in practice.