Open mcallaghan opened 1 year ago
I havent tested this so maybe I'm completely off the mark, but I think you can do this by nesting GridSearchCV objects:
model = make_pipeline(
...,
LogisticRegression()
)
param_gridsearch = GridSearchCV(
model,
param_grid=...
)
param_gridsearch.fit()
threshol_gridsearch = GridSearchCV(
Thresholder(param_gridsearch, refit=False),
param_grid={'threshold: [0.1, 0.2, ...]}
)
@MBrouns before closing the issue, could it be worth adding an example in the docs?
Having a closer look at this: actually the two approaches are a bit different. The implementation of
for each parameters in grid:
fit model with parameters
for each threshold in thresholds:
evaluate model
would still require to run thresholder for each fitted model, while the suggestion is to run it only on the best model.
Maybe a nested GridSearchCV
does the trick? (I never tried that)
mod = GridSearchCV(
estimator = Thresholder(
GridSearchCV(
estimator = SomeModel(),
param_grid={...},
...
),
threshold=0.1,
refit=False
),
param_grid = {
"threshold": np.linspace(0.1, 0.9, 10),
},
...
)
_ = mod.fit(X, y)
Thanks for this great set of extensions to sklearn.
The Tresholder() model is quite close to something I've been looking for for a while.
I'm looking to include threshold optimisation as part of a broader parameter search.
I can perhaps best describe the desired behaviour as follows
However, if I pass a model that has not yet been fit to Thresholder(), then, even with
refit=False
, the same model is fit also for each threshold.Is there an easy way around this? Thinking about this the best way to achieve this would be tinkering with the GridSearchCV code, but perhaps you have an idea and would also find this interesting?
Thanks!