scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
48 stars 34 forks source link

Sample props specification #6

Closed tguillemot closed 5 years ago

tguillemot commented 7 years ago

I open a PR to let people talk about sample_props.

amueller commented 7 years ago

please add a reference to https://docs.google.com/document/d/1k8d4vyw87gWODiyAyQTz91Z1KOnYr6runx-N074qIBY/edit

tguillemot commented 7 years ago

Add alternative proposition at the end of the document.

jnothman commented 7 years ago

I've just taken a quick look at the Google Doc, not that it's easy to read. I think an approach where routing is handled by meta-estimator is IMO more explicit and resilient to bugs / forwards compatibility issues than when the routing is handled by the base estimator / CV object / etc (akin to dependency injection). It might be easier to write code with the latter style, though.

jnothman commented 7 years ago

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

amueller commented 7 years ago

Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.

Sent from phone. Please excuse spelling and brevity.

On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .

jnothman commented 7 years ago

yes, that would be the motivation for specifying routing and data separately.

It's still not entirely clear to me that a sample_props object is better than kwargs (except that kwargs currently have different semantics)

On 8 Jun 2017 6:01 pm, "Andreas Mueller" notifications@github.com wrote:

Yeah the Google doc is not really readable. We stopped in the middle of the conversation yesterday. What I don't like about routing in meta estimators is that you could potentially route at many levels, and I tried to illustrate that. Either you route only on the outermost level and have a mess of underscores or you need the outer levels to allow for some pass-through. I feel this is similar to the "attaching search parameters to estimators" issue that you opened at some point, only here it's worse than the underscores in param_grid because there is only one grid search in that case, but here there are arbitrary many levels of meta estimators that could potentially interact with this.

Sent from phone. Please excuse spelling and brevity.

On Jun 8, 2017 06:04, "Joel Nothman" notifications@github.com wrote:

Another tricky case: sometimes we do if not hasattr(est, 'fit_transform'): return est.fit(...).transform(). In this case, how do we know if props need to be passed to transform as well as fit?

On 8 June 2017 at 02:07, Thierry Guillemot notifications@github.com wrote:

Add alternative proposition at the end of the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306843354, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAEz61V5PXmz2XQCCFQInuniSGdafoymks5sBsq4gaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/ pull/6#issuecomment-306992398, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbcFg91Uht4mvZ_ mgyqMQHjgXw42Xpaks5sB3LNgaJpZM4NyvH8 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307029697, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz60vOvWBJ8jauAoxlsdusna1SjAOAks5sB6p1gaJpZM4NyvH8 .

tguillemot commented 7 years ago

@jnothman It's not great but I tried to summarize the discussion of the google doc # 4. Alternative propositions for sample_props (06.17.17). Not sure but maybe it can help.

amueller commented 7 years ago

basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.

jnothman commented 7 years ago

no i still think we'd only need kwargs in metaestimators (prove me wrong) and that for base estimators a common test will ensure that they check kwargs for unwanted props

On 8 Jun 2017 6:58 pm, "Andreas Mueller" notifications@github.com wrote:

basically because we would need kwargs in all of our fit methods, which is not great and can swallow all kinds of errors. It can also cause problems for depending projects that use fit args.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307042876, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz68c4iUNl2mxHTYHzMDKVYJ-amVeEks5sB7eagaJpZM4NyvH8 .

amueller commented 7 years ago

hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everything?

jnothman commented 7 years ago

Well, GridSearch would have default routing replicating current behaviour. If we're talking about specifying routing at construction rather than in fit param name, Pipeline will require some change to conform, and is open to however we choose to route by default.

On 8 June 2017 at 19:21, Andreas Mueller notifications@github.com wrote:

hm... and we'd use the same / a similar routing mechanism? I think I like that.... and without routing it's passed to everthing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307048542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz63cSlBMdjFChEx59r7FprdcaaEolks5sB70zgaJpZM4NyvH8 .

amueller commented 7 years ago

well pipeline also has a default routing now with fit_params, right?

jnothman commented 7 years ago

That's got multiple parameters in fit, so it's got your issue of not having routing described abstractly as a class param.... but yes.

On 8 June 2017 at 19:35, Andreas Mueller notifications@github.com wrote:

well pipeline also has a default routing now with fit_params, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-307051955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz64UKkECKtiuIYW7DZA9IALhbvRlgks5sB8BbgaJpZM4NyvH8 .

ogrisel commented 7 years ago

We also need to take the RidgeCV, LogisticRegressionCV, LassoCV... into account: those need to have a way route sample probs info about the grouping to be used for leave-p group out CV to the cv object itself (as is the case for GridSearchCV et al).

glemaitre commented 7 years ago

Speaking with @ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

jnothman commented 7 years ago

I don't think we need to handle the case where you want to pass different weights to different parts of the machinery. Users can invent their own ways of doing such things. but the ability to turn on/off weighting the transformer vs the predictor is a more obvious case.

On 14 June 2017 at 22:15, Guillaume Lemaitre notifications@github.com wrote:

Speaking with @ogrisel https://github.com/ogrisel IRL, TransformTargetRegressor could be considered as use case. It takes a transformer and a regressor. At fit, one has to choose how to handle the sample_weight: 2 different sample_weight could be passed for each estimator, a single for both, etc. For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-308412947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6-batV9ZxfYcxLawZHi1AZtnEDVgks5sD87jgaJpZM4NyvH8 .

jnothman commented 7 years ago

I've decided this is a/the chief API priority and will try to put some focussed energy into it soon...

jnothman commented 7 years ago

For the moment, I could not find a transformer with a sample_weight but I assume that it could happen to be useful.

It's be easy to invent one from, say SelectFromModel. Yes, it supports arbitrary fit_params.

jnothman commented 7 years ago

The document here focuses on the solution without clearly analysing the problem. Our classic case here is sample_weight. The problems are then that:

There's some notion raised here that we need a dataframe-like object, but I contend that we've got almost no problems due to sample_weight being a kwarg, except perhaps that:

Advantages of sticking to kwargs include that:

So, IMO the main problem is routing. Will come back with more on that soon. Starting by analysing what we have so far.

jnothman commented 7 years ago

I should state the broader problem: Our meta-estimation framework (basically pipeline + grid search) breaks for some kinds of problems (e.g. weighted scoring). When they break, users write their own, and we stop offering the benefits of consistent interface, compatibility, etc., as well as no longer protecting users from errors such as data leakage and best-practice cross-validation.

jnothman commented 7 years ago

Existing sample props:

Handling of **kwargs in ways that do not just pass through:

Principles of routing:

jnothman commented 7 years ago

Further design issues in master:

Questions:

Solution alternatives:

  1. pass data and routing information (probably via parameter naming) to fit
  2. pass data to fit and routing information to meta-estimator constructor.

Within 2, there are a few further orthogonal design decisions:

If DIRECTION:=b then there are further questions such as whether routing is specified as a single constructor parameter, or multiple / in a specialised way depending on the meta-estimator.

I'll try make these designs, and their benefits, more concrete with example syntaxes at some point soon.

amueller commented 7 years ago

I solemnly swear I'll catch up to this soon ;)

jnothman commented 7 years ago

I'm not sure whether to integrate my comments into the proposal or just work on a demo

On 15 Aug 2017 12:06 am, "Andreas Mueller" notifications@github.com wrote:

I solemnly swear I'll catch up to this soon ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/pull/6#issuecomment-322199529, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz69Sefs93nZZoLh5CfP-UWRdt-t0jks5sYFR3gaJpZM4NyvH8 .

jnothman commented 7 years ago

I think the syntactic elegance of annotating an estimator with what it should receive, but it:

My approach adds a prop_routing parameter to metaestimators. Default behaviour is to maintain backwards compatibility. The metaestimator defines a set of destinations and a router function (either determined from user specification, or the default) translates the set of destinations and the dict of props into a dict for each destination.

How to specify routing is still open. Let's imagine we want to route parameter foo to fit(..) of all steps of a pipeline, and bar only to the step named me, where it is to be received with the name baz. In all cases we call pipeline.fit(X, y, foo=[1,2,3], bar=[4,5,6])

While I knew it was the case for Pipeline, I've belatedly realised that none of these allows us to neatly express the current routing policy for GridSearch, which is "pass everything but groups to estimator.fit; pass groups to cv".

One problem I have is forward compatibility if we allow destinations to be matched by patterns rather than exact name matches. If new destination possibilities (such as passing to transform in Pipeline) get added, this changes the meaning of the routing. This is one major advantage of annotating the destination with what it should receive. Alternatively, maybe string matching over destinations '' isn't what we want, but having destination aliases defined by the router, e.g. that destination '' in pipeline means "fit for all steps".

Question: Cases where we should raise errors:

jnothman commented 7 years ago

I'm off work sick today and seem to have produced a reasonable implementation of a variant at https://github.com/scikit-learn/scikit-learn/pull/9566

amueller commented 7 years ago

I think it would be great to have a list of application examples and then implement them in each of the alternative proposals. I tried to do that in the google doc, with more and more complex examples, i.e. nested cross-validation, pipelines where sample_weights are passed to some steps, or to score etc.

jnothman commented 7 years ago

Yeah, I've been thinking more of the "deep" solution where there is no routing, just kwarg naming conventions. The main problems are that it's verbose (but easy to read) and hard to maintain backwards compatibility, particularly in FeatureUnion and in a small way through making reserved prefixes in cross validation.

jnothman commented 7 years ago

Those things aren't issues if we don't use kwargs, but I really think we should stick to kwargs.

jnothman commented 7 years ago

And yes, demonstrations are a good idea. I just won't have time for any of this until closer to November, I expect.

amueller commented 7 years ago

I think we can wait another 2 month with this ;)