Open rootsmusic opened 8 months ago
Thanks a lot @rootsmusic ! Could it be related to a change in scikit-learn, silently caught in export_text_m5
?
If you have a bit of time to investigate that would be greatly appreciated. Otherwise I'll shoot for "best effort" in the upcoming weeks
(@smarie I'm unable to investigate, because I"m a Python novice.) You're probably right. I'm taking Professor Brooks' online course, which credited you. His notebook used scikit-learn 0.24.1, and his cell was:
# GridSearch comes in a cross validation variety, so let's import that
from sklearn.model_selection import GridSearchCV
# Now, let's set a few different hyperparameters the M5Prime class can work with
# I'm going to choose to explore a few different depths, a few minimum number of
# samples per leaf, and a few pruning options
parameters={'max_depth':(3,4,5,6),
'min_samples_leaf':(1,3,6),
'use_pruning':[False,True],
}
# Now we can just train our model as if it were a regression model directly. Be
# aware that this will take a bit of time to run
reg=GridSearchCV(estimator=M5Prime(use_smoothing=False), param_grid=parameters, cv=10, scoring='r2')
reg.fit(X_train.values,y_train.values)
# Ok, that was a lot to talk about. The tree is just part of the analysis though, we
# also have those regression equations at each leaf node. Recall that a regression
# equation is a bunch of coefficients, one for each feature, that are effectively a
# weighting which when summed together will produce a target value - in this case our
# percentage of votes. Now we can get these equations in a few ways, but Sylvain has nicely
# included a function which prints out the tree nodes and the linear model equations for
# us as well.
%run m5p.py
print(export_text_m5(reg.best_estimator_, out_file=None, node_ids=True))
for i,v in enumerate(X_train.columns):
print(f"{i}: {v}")
I'm running his notebook in scikit-learn 1.3.1, and I've replaced the first line with %run export.py
. However, the output is missing the equations.
Thanks a lot @rootsmusic ! I'll leverage this to have a look when I've got a bit of time.
I'm running the line
export_text_m5(reg.best_estimator_, out_file=None, node_ids=True)
. In an older version of your package, leaves with params>1 included linear model equations (e.g. LM1, LM2). Why are the equations not showing in the latest version of your package? Thanks.