Sample props specification

tguillemot commented 7 years ago

I open a PR to let people talk about sample_props.

amueller commented 7 years ago

please add a reference to https://docs.google.com/document/d/1k8d4vyw87gWODiyAyQTz91Z1KOnYr6runx-N074qIBY/edit

tguillemot commented 7 years ago

Add alternative proposition at the end of the document.

jnothman commented 7 years ago

I've just taken a quick look at the Google Doc, not that it's easy to read. I think an approach where routing is handled by meta-estimator is IMO more explicit and resilient to bugs / forwards compatibility issues than when the routing is handled by the base estimator / CV object / etc (akin to dependency injection). It might be easier to write code with the latter style, though.

jnothman commented 7 years ago

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

amueller commented 7 years ago

Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.

Sent from phone. Please excuse spelling and brevity.

On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .

jnothman commented 7 years ago

yes, that would be the motivation for specifying routing and data separately.

It's still not entirely clear to me that a sample_props object is better than kwargs (except that kwargs currently have different semantics)

On 8 Jun 2017 6:01 pm, "Andreas Mueller" notifications@github.com wrote:

Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.

Sent from phone. Please excuse spelling and brevity.

On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_ mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307029697, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz60vOvWBJ8jauAoxlsdusna1SjAOAks5sB6p1gaJpZM4NyvH8 .

tguillemot commented 7 years ago

@jnothman It's not great but I tried to summarize the discussion of the google doc # 4. Alternative propositions for sample_props (06.17.17). Not sure but maybe it can help.

amueller commented 7 years ago

basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.

jnothman commented 7 years ago

no i still think we'd only need kwargs in metaestimators (prove me wrong) and that for base estimators a common test will ensure that they check kwargs for unwanted props

On 8 Jun 2017 6:58 pm, "Andreas Mueller" notifications@github.com wrote:

basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307042876, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz68c4iUNl2mxHTYHzMDKVYJ-amVeEks5sB7eagaJpZM4NyvH8 .

amueller commented 7 years ago

hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everything?

jnothman commented 7 years ago

Well, GridSearch would have default routing replicating current behaviour. If we're talking about specifying routing at construction rather than in fit param name, Pipeline will require some change to conform, and is open to however we choose to route by default.

On 8 June 2017 at 19:21, Andreas Mueller notifications@github.com wrote:

hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everthing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307048542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz63cSlBMdjFChEx59r7FprdcaaEolks5sB70zgaJpZM4NyvH8 .

amueller commented 7 years ago

well pipeline also has a default routing now with fit_params, right?

jnothman commented 7 years ago

That's got multiple parameters in fit, so it's got your issue of not having routing described abstractly as a class param.... but yes.

On 8 June 2017 at 19:35, Andreas Mueller notifications@github.com wrote:

well pipeline also has a default routing now with fit_params, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307051955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz64UKkECKtiuIYW7DZA9IALhbvRlgks5sB8BbgaJpZM4NyvH8 .

ogrisel commented 7 years ago

We also need to take the RidgeCV, LogisticRegressionCV, LassoCV... into account: those need to have a way route sample probs info about the grouping to be used for leave-p group out CV to the cv object itself (as is the case for GridSearchCV et al).

glemaitre commented 7 years ago

Speaking with @ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

jnothman commented 7 years ago

I don't think we need to handle the case where you want to pass different weights to different parts of the machinery. Users can invent their own ways of doing such things. but the ability to turn on/off weighting the transformer vs the predictor is a more obvious case.

On 14 June 2017 at 22:15, Guillaume Lemaitre notifications@github.com wrote:

Speaking with @ogrisel https://github.com/ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-308412947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6-batV9ZxfYcxLawZHi1AZtnEDVgks5sD87jgaJpZM4NyvH8 .

jnothman commented 7 years ago

I've decided this is a/the chief API priority and will try to put some focussed energy into it soon...

jnothman commented 7 years ago

For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

It's be easy to invent one from, say SelectFromModel. Yes, it supports arbitrary fit_params.

jnothman commented 7 years ago

The document here focuses on the solution without clearly analysing the problem. Our classic case here is sample_weight. The problems are then that:

we treat sample_weight as special, while there are other sample-aligned properties that should be potentially passed around by meta-estimators
implementing support for it can be a hassle or can be forgotten and added later, creating maintenance overhead
we still have some cases of kwargs to methods that are not sample aligned (e.g. return_std; and more in "scikit-learn-compatible" libraries)
we often create conditional structures like do_something(X, y, sample_weight=sample_weight) if sample_weight is not None else do_something(X, y) but that can be avoided by **props where props={'sample_weight': sample_weight}.
we don't know where to route it and have diverse specifications for routing. Sometimes we even use has_fit_parameter which I think should be outlawed (because it does not support meta-estimators readily).

There's some notion raised here that we need a dataframe-like object, but I contend that we've got almost no problems due to sample_weight being a kwarg, except perhaps that:

the method needs to protect against the possibility that sample_weight isn't passed when it should be, and vice versa.
sometimes it has been passed as a positional arg (not sure this is a problem for backwards compatibility)
if there were more props, it might take some effort to make sure everything is bundled up correctly and passed around if they are kwargs.

Advantages of sticking to kwargs include that:

most methods won't accept (many) props, and Python will deal with this automatically
backwards compatibility (and no deprecation)
don't need to support multiple dataframe-like formats
can still be operated with **my_dataframe

So, IMO the main problem is routing. Will come back with more on that soon. Starting by analysing what we have so far.

jnothman commented 7 years ago

I should state the broader problem: Our meta-estimation framework (basically pipeline + grid search) breaks for some kinds of problems (e.g. weighted scoring). When they break, users write their own, and we stop offering the benefits of consistent interface, compatibility, etc., as well as no longer protecting users from errors such as data leakage and best-practice cross-validation.

jnothman commented 7 years ago

Existing sample props:

y: fit-like methods are required to support y as the second positional argument. Nonetheless, we sometimes have code paths like that for sample_weight where we do_something(X, y) if y is not None else do_something(X), as in TransformerMixin.fit_transform, cross-val fitting, BaseSearchCV refitting. We also have code which validates y as either being None or of the same length as X. If passed where unwanted, no error will be raised.
sample_weight: accepted (and always optional) essentially only in fit-style functions (including fit_predict, fit_transform and partial_fit) and scoring functions (including gradient boosting errors). It also forms part of a utils.dataset.*Dataset. If passed where unwanted an error will often be raised.
groups may be passed into *SearchCV.fit where it is validated, and then passed to the CV splitter mandatorily (i.e. all CV splitters must accept groups as a positional argument). If passed where unwanted, no error is raised. If not passed when wanted, an error is raised. No splitters currently use groups without requiring it.

Handling of **kwargs in ways that do not just pass through:

*SearchCV has fit_params attribute. Corresponding data is sliced up and passed to fit in cross validation.
parameters passed to Pipeline.fit{,_transform,_predict} are interpreted as using double-underscore to divide between the step name and the kwarg name. They are only passed to the respective steps' fit/fit_transform. Any names with an unknown step name prefix result in a KeyError. The step estimator has the responsibility deal with unknown names.
FeatureUnion.fit does not support kwargs, but FeatureUnion.fit_transform does. It passes all to the constituent transformers' fit or fit_transform methods (no use of double-underscore prefixing).

Principles of routing:

All meta-estimators (where fit or fit_* on a wrapped estimator is called by a parent estimator) should be able to support forwarding any props to the downstream fit.
Meta-estimators where another method (transform, predict, etc) is called should perhaps be able to pass props to those methods. In the case of something like Pipeline this would need to be done during Pipeline.fit and hence either needs to be passed to all steps in the pipeline, or the user must be able to control which steps (or the step can choose/indicate its acceptance or rejection of some prop).
It should be possible to supply sample_weight to a composite metaestimator, such as a Pipeline or FeatureUnion, but for only some of the downstream estimators to receive it (but this could be controlled on the recipient's side if we wish).
All estimators using cv.split must be able to receive groups in their fit and pass them on to cv.split.
In turn, this means that any meta-estimator using cv.split must be able to pass on groups to a downstream fit, despite the fact that many downstream estimators would not accept groups. This suggests the need for a switch to say whether or not groups should be passed on. Or maybe this should be something that the wrapped estimator indicates.
Similarly, all estimators using scoring must be able to receive sample_weight in their fit and pass it on to scoring. In turn, this means that any meta-estimators must be able to pass on sample_weight to a downstream scoring, but this may not always be desired behaviour. Similarly, we may want to support passing weights to scoring, but not to the downstream fit.

jnothman commented 7 years ago

Further design issues in master:

prop may need to be repeated under multiple names to be passed into Pipeline.fit
contradictory/incompatible approach to props in FeatureUnion and Pipeline

Questions:

do we want to assume "one sample weight at a time" or, say, should we be allowed to have a different sample weight for fit and for score? How much does this cost us in usability?
should user prescribe routing (whether/where a particular prop should be passed) at: (a) meta-estimator construction time; (b) meta-estimator fit time, as in Pipeline now; (c) base estimator construction time? I feel (c) is a bad idea and I can try to expand on why if someone asks.

Solution alternatives:

pass data and routing information (probably via parameter naming) to fit
pass data to fit and routing information to meta-estimator constructor.

Within 2, there are a few further orthogonal design decisions:

DIRECTION: routing is specified: a. as a mapping from each source fit param name to a list of sinks; or b. as a mapping from each sink path to a single source fit param
RENAME: the routing sink: a. includes the parameter name that should be used in the destination, allowing parameters to be renamed in routing b. just specifies the sink and the parameter is not renamed (i.e. sample_weight has to be passed as sample weight, and it is impossible to specify a different weight for different parts of a pipeline or for fit vs score)

If DIRECTION:=b then there are further questions such as whether routing is specified as a single constructor parameter, or multiple / in a specialised way depending on the meta-estimator.

I'll try make these designs, and their benefits, more concrete with example syntaxes at some point soon.

amueller commented 7 years ago

I solemnly swear I'll catch up to this soon ;)

jnothman commented 7 years ago

I'm not sure whether to integrate my comments into the proposal or just work on a demo

On 15 Aug 2017 12:06 am, "Andreas Mueller" notifications@github.com wrote:

I solemnly swear I'll catch up to this soon ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-322199529, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz69Sefs93nZZoLh5CfP-UWRdt-t0jks5sYFR3gaJpZM4NyvH8 .

jnothman commented 7 years ago

I think the syntactic elegance of annotating an estimator with what it should receive, but it:

won't work for methods other than fit
won't work for scorers or cv defined as strings/numbers
may be hard to maintain backwards compatibility with.

My approach adds a prop_routing parameter to metaestimators. Default behaviour is to maintain backwards compatibility. The metaestimator defines a set of destinations and a router function (either determined from user specification, or the default) translates the set of destinations and the dict of props into a dict for each destination.

How to specify routing is still open. Let's imagine we want to route parameter foo to fit(..) of all steps of a pipeline, and bar only to the step named me, where it is to be received with the name baz. In all cases we call pipeline.fit(X, y, foo=[1,2,3], bar=[4,5,6])

{src_prop: [(dst_pattern, dst_prop), ...]} is what I've currently almost implemented at https://github.com/scikit-learn/scikit-learn/compare/master...jnothman:route. E.g. Pipeline(prop_routing={'foo': [('*', '*')], 'bar': [('me', 'baz')]})
{dst_pattern: {src_prop: dst_prop}} may be nicer to work with. E.g. Pipeline(prop_routing={'*': {'foo': 'foo'}, 'me': {'bar': 'baz'}})
variants of the latter, like [(dst_pattern, {src_prop: dst_prop}), ...] and [(dst_pattern, src_prop, dst_prop]), ...] (e.g. Pipeline()) are also options

While I knew it was the case for Pipeline, I've belatedly realised that none of these allows us to neatly express the current routing policy for GridSearch, which is "pass everything but groups to estimator.fit; pass groups to cv".

One problem I have is forward compatibility if we allow destinations to be matched by patterns rather than exact name matches. If new destination possibilities (such as passing to transform in Pipeline) get added, this changes the meaning of the routing. This is one major advantage of annotating the destination with what it should receive. Alternatively, maybe string matching over destinations '' isn't what we want, but having destination aliases defined by the router, e.g. that destination '' in pipeline means "fit for all steps".

Question: Cases where we should raise errors:

where a destination pattern does not match one known: I think so.
where some props are not routed anywhere: not sure (but yes in Pipeline).

jnothman commented 7 years ago

I'm off work sick today and seem to have produced a reasonable implementation of a variant at https://github.com/scikit-learn/scikit-learn/pull/9566

amueller commented 7 years ago

I think it would be great to have a list of application examples and then implement them in each of the alternative proposals. I tried to do that in the google doc, with more and more complex examples, i.e. nested cross-validation, pipelines where sample_weights are passed to some steps, or to score etc.

jnothman commented 7 years ago

Yeah, I've been thinking more of the "deep" solution where there is no routing, just kwarg naming conventions. The main problems are that it's verbose (but easy to read) and hard to maintain backwards compatibility, particularly in FeatureUnion and in a small way through making reserved prefixes in cross validation.

jnothman commented 7 years ago

Those things aren't issues if we don't use kwargs, but I really think we should stick to kwargs.

jnothman commented 7 years ago

And yes, demonstrations are a good idea. I just won't have time for any of this until closer to November, I expect.

amueller commented 7 years ago

I think we can wait another 2 month with this ;)

scikit-learn / enhancement_proposals

Sample props specification #6