rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.21k stars 530 forks source link

[FEA] Obtaining individual tree results of a random forest #6115

Open YuelingMa0 opened 1 week ago

YuelingMa0 commented 1 week ago

Is your feature request related to a problem? Please describe. I wish I could use cuml to obtain individual tree results of a random forest. However, this function is not supported in the current cuml package. Using the random forest regression function in the current cuml package, I can only obtain the average of tree results.

Describe the solution you'd like An attribute in the existing random forest regression function to provide results from each tree.

Describe alternatives you've considered I have been using the "estimator_" in the RandomForestRegressor function of scikit-learn to obtain individual tree ouputs, but that package only works on CPUs.

Additional context https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

hcho3 commented 1 week ago

You can use the predict_per_tree function from the Forest Inference Library (FIL). Note that this feature is only available from the experimental version of FIL.

from cuml.experimental import ForestInference

# ...

fm = ForestInference.load_from_sklearn(skl_model)
pred_per_tree = fm.predict_per_tree(X)  # Returns array of size (num_row, num_tree, leaf_size) 
YuelingMa0 commented 1 week ago

Thank you!

YuelingMa0 commented 1 day ago

I got the error "Negative size passed to PyBytes_FromStringAndSize" when I loaded sklearn model. I am also curious if "predict_per_tree" attribute also works for a model trained by cuml?

hcho3 commented 1 day ago

"Negative size passed to PyBytes_FromStringAndSize" when I loaded sklearn model.

Can you share the model with us so that we can troubleshoot?

I am also curious if "predict_per_tree" attribute also works for a model trained by cuml?

Yes, it should work with a cuML model.

YuelingMa0 commented 1 day ago

Here are my random forest models, one trained using sklearn and the other trained using cuml. I converted the random forest model trained using cuml to ForestInference, and tried to use "predict_per_tree" for the cuML model. I obtained an attribute error "AttributeError: predict_per_tree". I am using the version 24.10.00.