Closed tguillemot closed 5 years ago
please add a reference to https://docs.google.com/document/d/1k8d4vyw87gWODiyAyQTz91Z1KOnYr6runx-N074qIBY/edit
Add alternative proposition at the end of the document.
I've just taken a quick look at the Google Doc, not that it's easy to read. I think an approach where routing is handled by meta-estimator is IMO more explicit and resilient to bugs / forwards compatibility issues than when the routing is handled by the base estimator / CV object / etc (akin to dependency injection). It might be easier to write code with the latter style, though.
Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?
On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:
Add alternative proposition at the end of the document.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .
Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.
Sent from phone. Please excuse spelling and brevity.
On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:
Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?
On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:
Add alternative proposition at the end of the document.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .
yes, that would be the motivation for specifying routing and data separately.
It's still not entirely clear to me that a sample_props object is better than kwargs (except that kwargs currently have different semantics)
On 8 Jun 2017 6:01 pm, "Andreas Mueller" notifications@github.com wrote:
Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.
Sent from phone. Please excuse spelling and brevity.
On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:
Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?
On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:
Add alternative proposition at the end of the document.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_ mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307029697, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz60vOvWBJ8jauAoxlsdusna1SjAOAks5sB6p1gaJpZM4NyvH8 .
@jnothman It's not great but I tried to summarize the discussion of the google doc # 4. Alternative propositions for sample_props (06.17.17)
. Not sure but maybe it can help.
basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.
no i still think we'd only need kwargs in metaestimators (prove me wrong) and that for base estimators a common test will ensure that they check kwargs for unwanted props
On 8 Jun 2017 6:58 pm, "Andreas Mueller" notifications@github.com wrote:
basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307042876, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz68c4iUNl2mxHTYHzMDKVYJ-amVeEks5sB7eagaJpZM4NyvH8 .
hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everything?
Well, GridSearch would have default routing replicating current behaviour. If we're talking about specifying routing at construction rather than in fit param name, Pipeline will require some change to conform, and is open to however we choose to route by default.
On 8 June 2017 at 19:21, Andreas Mueller notifications@github.com wrote:
hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everthing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307048542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz63cSlBMdjFChEx59r7FprdcaaEolks5sB70zgaJpZM4NyvH8 .
well pipeline also has a default routing now with fit_params, right?
That's got multiple parameters in fit, so it's got your issue of not having routing described abstractly as a class param.... but yes.
On 8 June 2017 at 19:35, Andreas Mueller notifications@github.com wrote:
well pipeline also has a default routing now with fit_params, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307051955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz64UKkECKtiuIYW7DZA9IALhbvRlgks5sB8BbgaJpZM4NyvH8 .
We also need to take the RidgeCV
, LogisticRegressionCV
, LassoCV
... into account: those need to have a way route sample probs info about the grouping to be used for leave-p group out CV to the cv
object itself (as is the case for GridSearchCV
et al).
Speaking with @ogrisel IRL, TransformTargetRegressor
could be considered as use case. It takes a transformer
and a regressor
. At fit
, one has to choose how to handle the sample_weight
: 2 different sample_weight
could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer
with a sample_weight
but I assume that it could happen to be useful.
I don't think we need to handle the case where you want to pass different weights to different parts of the machinery. Users can invent their own ways of doing such things. but the ability to turn on/off weighting the transformer vs the predictor is a more obvious case.
On 14 June 2017 at 22:15, Guillaume Lemaitre notifications@github.com wrote:
Speaking with @ogrisel https://github.com/ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-308412947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6-batV9ZxfYcxLawZHi1AZtnEDVgks5sD87jgaJpZM4NyvH8 .
I've decided this is a/the chief API priority and will try to put some focussed energy into it soon...
For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.
It's be easy to invent one from, say SelectFromModel
. Yes, it supports arbitrary fit_params
.
The document here focuses on the solution without clearly analysing the problem. Our classic case here is sample_weight
. The problems are then that:
sample_weight
as special, while there are other sample-aligned properties that should be potentially passed around by meta-estimatorsreturn_std
; and more in "scikit-learn-compatible" libraries)do_something(X, y, sample_weight=sample_weight) if sample_weight is not None else do_something(X, y)
but that can be avoided by **props
where props={'sample_weight': sample_weight}
.has_fit_parameter
which I think should be outlawed (because it does not support meta-estimators readily).There's some notion raised here that we need a dataframe-like object, but I contend that we've got almost no problems due to sample_weight being a kwarg, except perhaps that:
Advantages of sticking to kwargs include that:
**my_dataframe
So, IMO the main problem is routing. Will come back with more on that soon. Starting by analysing what we have so far.
I should state the broader problem: Our meta-estimation framework (basically pipeline + grid search) breaks for some kinds of problems (e.g. weighted scoring). When they break, users write their own, and we stop offering the benefits of consistent interface, compatibility, etc., as well as no longer protecting users from errors such as data leakage and best-practice cross-validation.
Existing sample props:
y
: fit-like methods are required to support y
as the second positional argument. Nonetheless, we sometimes have code paths like that for sample_weight
where we do_something(X, y) if y is not None else do_something(X)
, as in TransformerMixin.fit_transform
, cross-val fitting, BaseSearchCV
refitting. We also have code which validates y as either being None or of the same length as X. If passed where unwanted, no error will be raised.sample_weight
: accepted (and always optional) essentially only in fit-style functions (including fit_predict
, fit_transform
and partial_fit
) and scoring functions (including gradient boosting errors). It also forms part of a utils.dataset.*Dataset
. If passed where unwanted an error will often be raised.groups
may be passed into *SearchCV.fit
where it is validated, and then passed to the CV splitter mandatorily (i.e. all CV splitters must accept groups as a positional argument). If passed where unwanted, no error is raised. If not passed when wanted, an error is raised. No splitters currently use groups
without requiring it.Handling of **kwargs
in ways that do not just pass through:
*SearchCV
has fit_params
attribute. Corresponding data is sliced up and passed to fit in cross validation.Pipeline.fit{,_transform,_predict}
are interpreted as using double-underscore to divide between the step name and the kwarg name. They are only passed to the respective steps' fit
/fit_transform
. Any names with an unknown step name prefix result in a KeyError. The step estimator has the responsibility deal with unknown names.FeatureUnion.fit
does not support kwargs, but FeatureUnion.fit_transform
does. It passes all to the constituent transformers' fit
or fit_transform
methods (no use of double-underscore prefixing).Principles of routing:
fit
or fit_*
on a wrapped estimator is called by a parent estimator) should be able to support forwarding any props to the downstream fit.Pipeline.fit
and hence either needs to be passed to all steps in the pipeline, or the user must be able to control which steps (or the step can choose/indicate its acceptance or rejection of some prop).sample_weight
to a composite metaestimator, such as a Pipeline
or FeatureUnion
, but for only some of the downstream estimators to receive it (but this could be controlled on the recipient's side if we wish).cv.split
must be able to receive groups
in their fit
and pass them on to cv.split
.cv.split
must be able to pass on groups
to a downstream fit
, despite the fact that many downstream estimators would not accept groups
. This suggests the need for a switch to say whether or not groups should be passed on. Or maybe this should be something that the wrapped estimator indicates.sample_weight
in their fit
and pass it on to scoring. In turn, this means that any meta-estimators must be able to pass on sample_weight
to a downstream scoring, but this may not always be desired behaviour. Similarly, we may want to support passing weights to scoring, but not to the downstream fit
.Further design issues in master:
Pipeline.fit
Questions:
Solution alternatives:
fit
fit
and routing information to meta-estimator constructor.Within 2, there are a few further orthogonal design decisions:
fit
paramsample_weight
has to be passed as sample weight, and it is impossible to specify a different weight for different parts of a pipeline or for fit vs score)If DIRECTION:=b then there are further questions such as whether routing is specified as a single constructor parameter, or multiple / in a specialised way depending on the meta-estimator.
I'll try make these designs, and their benefits, more concrete with example syntaxes at some point soon.
I solemnly swear I'll catch up to this soon ;)
I'm not sure whether to integrate my comments into the proposal or just work on a demo
On 15 Aug 2017 12:06 am, "Andreas Mueller" notifications@github.com wrote:
I solemnly swear I'll catch up to this soon ;)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-322199529, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz69Sefs93nZZoLh5CfP-UWRdt-t0jks5sYFR3gaJpZM4NyvH8 .
I think the syntactic elegance of annotating an estimator with what it should receive, but it:
My approach adds a prop_routing
parameter to metaestimators. Default behaviour is to maintain backwards compatibility. The metaestimator defines a set of destinations and a router function (either determined from user specification, or the default) translates the set of destinations and the dict of props into a dict for each destination.
How to specify routing is still open. Let's imagine we want to route parameter foo
to fit(..)
of all steps of a pipeline, and bar
only to the step named me
, where it is to be received with the name baz
. In all cases we call pipeline.fit(X, y, foo=[1,2,3], bar=[4,5,6])
{src_prop: [(dst_pattern, dst_prop), ...]}
is what I've currently almost implemented at https://github.com/scikit-learn/scikit-learn/compare/master...jnothman:route. E.g. Pipeline(prop_routing={'foo': [('*', '*')], 'bar': [('me', 'baz')]})
{dst_pattern: {src_prop: dst_prop}}
may be nicer to work with. E.g. Pipeline(prop_routing={'*': {'foo': 'foo'}, 'me': {'bar': 'baz'}})
[(dst_pattern, {src_prop: dst_prop}), ...]
and [(dst_pattern, src_prop, dst_prop]), ...]
(e.g. Pipeline()
) are also optionsWhile I knew it was the case for Pipeline, I've belatedly realised that none of these allows us to neatly express the current routing policy for GridSearch, which is "pass everything but groups to estimator.fit; pass groups to cv".
One problem I have is forward compatibility if we allow destinations to be matched by patterns rather than exact name matches. If new destination possibilities (such as passing to transform
in Pipeline) get added, this changes the meaning of the routing. This is one major advantage of annotating the destination with what it should receive. Alternatively, maybe string matching over destinations '' isn't what we want, but having destination aliases defined by the router, e.g. that destination '' in pipeline means "fit for all steps".
Question: Cases where we should raise errors:
I'm off work sick today and seem to have produced a reasonable implementation of a variant at https://github.com/scikit-learn/scikit-learn/pull/9566
I think it would be great to have a list of application examples and then implement them in each of the alternative proposals. I tried to do that in the google doc, with more and more complex examples, i.e. nested cross-validation, pipelines where sample_weights are passed to some steps, or to score etc.
Yeah, I've been thinking more of the "deep" solution where there is no routing, just kwarg naming conventions. The main problems are that it's verbose (but easy to read) and hard to maintain backwards compatibility, particularly in FeatureUnion and in a small way through making reserved prefixes in cross validation.
Those things aren't issues if we don't use kwargs, but I really think we should stick to kwargs.
And yes, demonstrations are a good idea. I just won't have time for any of this until closer to November, I expect.
I think we can wait another 2 month with this ;)
I open a PR to let people talk about
sample_props
.