Closed mheilman closed 8 years ago
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed, please reply here (e.g. I signed it!
) and we'll verify. Thanks.
I'm probably missing changes in the docs or examples here. If this looks OK in general, I'll make a pass at documentation updates (and sign the CLA, etc.).
(also, oops, fixing the tests now...)
Hi, @mheilman thanks for PR, but current interface of monitor was organized this way according to monitors in current estimators: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier.fit
And there are issues with passing monitor and logdir to init due to the constructor params been used as hyperparams - they are saved / restored with a model. The user may want to build a model with one monitor / logdir and then restore with another.
I agree though that Pipeline / GridSearchCV don't support monitors - I actually think this should be fixed in sklearn.
Ah, right, I forgot that gradient boosting models have a monitor
argument.
In fact, GridSearchCV.__init__
has a fit_params
keyword argument, so logdir
and monitor
can just be passed in there. Sorry I didn't notice that earlier. I had been looking at GridSearchCV.fit
(which only takes X
and y
).
I'm not sure how the fit params would work with pipelines, but they might be fine there, too, or will eventually.
Didn't know about fit_params
, thanks for pointing out! If you are interested, you can try adding an example that uses GridSearchCV
with monitor
and logdir
- I think would be pretty useful.
Hmm... what about keyword arguments to predict
methods (e.g., batch_size
and axis
here)? I don't see a way to access those via GridSearchCV
.
The default for axis
seems fine for my purposes, but I ran into an issue with the batch_size
default, where GridSearchCV
was taking a ton of memory (> 8GB) because it was trying to predict on the entire dataset at once rather than in batches. Do you think it would make sense to make the default batch_size
for predict
be self.batch_size
(but still let the user override is when needed)?
I'd be happy to make a PR if you think that makes sense. Sorry, it's a bit of a separate issue than this PR, but I figured it was related enough to discuss here...
Yeah, it make sense to have some default value and self.batch_size
is
closer then anything else. PR would be welcome!
On Thu, Mar 3, 2016 at 1:21 PM, Michael Heilman notifications@github.com wrote:
Hmm... what about keyword arguments to predict methods (e.g., batch_size and axis here https://github.com/tensorflow/skflow/blob/67d3fb03efdfa37e2b7013e2802d1b737f8fbe85/skflow/estimators/base.py#L270)? I don't see a way to access those via GridSearchCV.
The default for axis seems fine for my purposes, but I ran into an issue with the batch_size default, where GridSearchCV was taking a ton of memory (> 8GB) because it was trying to predict on the entire dataset at once rather than in batches. Do you think it would make sense to make the default batch_size for predict be self.batch_size (but still let the user override is when needed)?
I'd be happy to make a PR if you think that makes sense.
— Reply to this email directly or view it on GitHub https://github.com/tensorflow/skflow/pull/125#issuecomment-191969550.
Best regards, Illia Polosukhin
OK, I made that minor change (and added an example of grid search) in #126.
This moves the
monitor
andlogdir
arguments for thefit
method to the base estimator__init__
method so thatfit
better follows the sklearn API guidelines: "fit parameters should be restricted to directly data dependent variables". This should make it easier to use, e.g.,GridSearchCV
with skflow models.