Closed sfrischkorn closed 2 years ago
Hi @sfrischkorn , yes, this looks like it could be very useful. At some point we could also integrate more sophisticated techniques like Bayesian optimisation, but this sounds already very useful. We would welcome a PR :)
Grid searches with multiple hyperparameters gets computationally expensive very quickly as you expand the range of possible hyperparameters. My current data set takes approx 7.5 hrs for each iteration of the gridsearch, so a grid search becomes overwhelmingly expensive very quickly. Implementing a random search can search through a random subset of a parameter grid more quickly to find optimal parameters.
Describe proposed solution I have written a simple ammendment to the GridSearch function that introduces a new optional parameter, n_samples, which defines the number of parameter combinations to test. I've made it take either an integer, which defines an absolute number of samples, or a float which takes a percentage. After the parameter cross product is calculated, it randomly takes that number of samples from the list and the iterator is created from them instead of the full cartesian product list.
Is this a worthwhile addition to the code base? I can add it in if so.