rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[FEA]Access Individual RandomForestRegressor Trees #6026

Open BonnieRuefenacht opened 1 month ago

BonnieRuefenacht commented 1 month ago

Is there a way to get predictions of individual trees of a RandomForestRegressor model? For instance, in Scikit-Learn I can do the following:

tree_pred = np.array([tree.predict(test_dataArray) for tree in rf_model]).T predictions = np.mean(tree_pred, axis=1) variance = (tree_pred - predictions.reshape(-1,1))**2 se = np.sqrt(np.mean(variance, axis=1))

As is shown in the code above, where rf_model is a Scikit-Learn RandomForestRegressor model I would like to get the predictions for each individual tree to obtain a standard error estimate for each prediction. Is it possible to replicate this using a cuml RandomForestRegressor model?

BonnieRuefenacht commented 4 weeks ago

I've done some more research.

Basically, I'm wondering if you could return the row_prediction.data() located here:

https://github.com/rapidsai/cuml/blob/c7f53ef92d80f604b04829406b1c0e16ba563823/cpp/src/randomforest/randomforest.cuh#L236

It seems that this contains the data for all the trees. I could use this to calculate standard error for each estimate. Thanks.

dantegd commented 3 weeks ago

Thanks for the issue @BonnieRuefenacht, I think @hcho3 is away for a couple of days, but he might be able to answer here.

hcho3 commented 3 weeks ago

@BonnieRuefenacht

You can use the predict_per_tree function from the Forest Inference Library (FIL). Note that this feature is only available from the experimental version of FIL.

from cuml.experimental import ForestInference

# ...

fm = ForestInference.load_from_sklearn(skl_model)
pred_per_tree = fm.predict_per_tree(X)  # Returns array of size (num_row, num_tree, leaf_size) 
BonnieRuefenacht commented 3 weeks ago

Thank you. I will try that.