jlevy44 / InteractionTransformer

Extract meaningful interactions from machine learning models to obtain machine-learning performance with statistical model interpretability.
MIT License
7 stars 2 forks source link

Endless run with RandomForest #3

Closed jvdboogaard closed 3 years ago

jvdboogaard commented 3 years ago

Dear jlevy44,

I am very thankful for being able to use your code. However, I ran into a problem while trying to use the code for the Random Forest Regressor. If I run: " transformer=InteractionTransformer(RandomForestRegressor(random_state = 42),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False) transformer.fit(X_train,y_train) " Then the code is done in 7/8 minutes, and I get the results that I want. But when I add multiple parameters to the Random Forest Regressor, like for example: "transformer=InteractionTransformer(RandomForestRegressor(random_state = 42, n_estimators = 2000, max_features = 0.2, max_depth = 50, bootstrap = True),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False) transformer.fit(X_train,y_train) " Then the code keeps running and running and it doesn't end. I really want to see the output for the Random Forest Regressor with all the specified parameters because this model has a much better fit on my data. Do you know how to solve the problem?

Thanks in advance.

Kind regards, Jeroen

jlevy44 commented 3 years ago

Thanks Jeroen for your message. My first question is do you really need that high number of estimators and depth? I would suspect that to significantly slow down the shapley interaction score calculation. If the number of estimators is justifiable, you may also want to play around with some of the other parameters of the transformer, like the tree_limit. Eg. see this github issue and related in the SHAP repository: https://github.com/slundberg/shap/issues/208

jvdboogaard commented 3 years ago

Thank you for the extremely fast reply. The parameters included are tuned by means of a grid search and yielded the best performances. But now I know that it is just the calculation of the shapley interaction score, meaning that it will just take a long time to run the code, I will just let the code run overnight and wait patiently for the results.

Thanks again