Closed poroc300 closed 2 years ago
HI, thank you for your message. This is something I had in mind, so I agree it could be useful and it would make the package more similar to the scikit-learn API I'll add this to work it in the release of version 0.7.0
That's great, thank you!
Hello @rodrigo-arenas, I would like to contribute here. I have some experience with building custom classes and would like to give it a go.
Hello @sushmitaS16 , thanks for your message, sure, go for it!
Please make sure of reading how this is implemented in scikit-learn, in their BaseSearchCV class in here, especially in the fit
method and all the checkings.
On this package, the function that format the results is in this module, you must also take care of checking the metrics and the refit
parameter so it matches with the described on _check_refit_for_multimetric
function and uses this metric as the evaluation criteria.
Let me know if you have any questions!
Hi @rodrigo-arenas,
In BaseSearchCV's fit
method, there is a section that overlooks the parallelization of both GridSearchCV
and RandomizedSearchCV
, respectively. I just need to ask that if we are to add multimetric scoring here, parallelization will have to be incorporated as well. Is that correct?
Please forgive me if I am misunderstanding something here!
Hi @sushmitaS16.
There is no need to explicitly add this parallelization in the fit
method, as in the evaluate
method of GASearchCV
I'm using the scikit-learn's cross_validate function, which already implements this parallelization process using the n_jobs and pre_dispatch parameters.
Greetings!
Hi, I am a little confused about this method evaluate_candidates in fit
. I gather this is the same you have mentioned in here.
Do you think that the multimetric scoring can be incorporated without this?
Thanks
Hi, it's the same, but it's not necessary to implement it here, that function must exist in the GASearchCV
class, otherwise, it wouldn't be possible to use inheritance from BaseSearchCV
.
Part of that logic is in the method evaluate.
What must be modified is in this evaluate
method, since it's the one that uses the scoring and refit parameters and uses the main metric to tell how good a solution is, it also saves in the logbook object the information to format the results of cross-validation in self.cvresults using the formatting function.
I hope this can help.
Hi @poroc300
I've finally got some proper time to approach this request since it was needed to change a big part of how the cv_results_
and logbook
are generated; this is now implemented in PR #85 it'll be available in the next release.
You will see this reflected on the logbook
and cv_results_
objects, where now you get results for each metric.
As in scikit-learn, if multi-metric is used, the refit
parameter must be a string specifying the metric to evaluate the cv-scores.
See more in the GASearchCV
and GAFeatureSelectionCV
development API documentation.
Many thanks for the time and effort to address this issue.
No problem!
Hello,
I have been looking into your package and it is really cool. Thank you for putting a lot of effort in developing such an amazing tool.
Is your feature request related to a problem? Please describe.
GASearchCV
, unlikeGridSearchCV
, only accepts one scoring metric. Obviously, the algorithm can only use one metric to decide which models will carry over to the next generation. However, I think it would be useful to view different scoring metrics for the best models (e.g. R2, MAE, RMSE), which intrinsically may provide a slightly different idea of model performance to the user. Of course we would still be able to decide which metric should be used to select the best models within each generation.Describe the solution you'd expect I think the implementation of multiple scoring metrics in
GASearchCV
could be similar to the one implemented inGridSearchCV
regarding this specific matter. I show below some examples of this implementation inGridSearchCV
:If you call
grid.cv_results_
in this example, you will see the outputdict
will have amean_test_MAE
andmean_test_R2
keys (in the case of the second example).