scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
48 stars 34 forks source link

Questions and comments about SLEP006 -- Sample props #47

Closed NicolasHug closed 2 years ago

NicolasHug commented 4 years ago

I've made a pass over the SLEP. It is overall very clear and the different cases really help. Thanks @jnothman and @adrinjalali for your efforts.

Here's a list of questions and comments that I have after reading it.

1.

we can consider the use of keys that are not limited to strings or valid identifiers (and hence are not limited to using _ as a delimiter).

I don't understand how we are currently limited to using _ as a delimiter. Should this be __? But even then I still don't follow.

2.

TODO: proceed from here. Note that this change implies the need to add a parameter to unwrap_X, since we will now append an additional column to X.

I don't understand why we don't need to add an additional column to X in this case.

3.

while a GridSearchCV wrapping a Pipeline currently takes parameters with keys like {step_name}{prop_name}, this explicit routing, and conflict with GridSearchCV routing destinations, implies keys like estimator__{step_name}{prop_name}.

I'm not sure I completely understand this. A small illustration might help? Also, does "GridSearchCV routing destinations" refers to param_grid instead?

4.

All consumers would be required to check that

This sentence seems unfinished?

5.

About solution 4:

Here the meta-estimator provides only what its each of its children requests. The meta-estimator would also need to request, on behalf of its children, any prop that descendant consumers require.

I could be wrong, but this doesn't seem to define the "essence" of solution 4, especially comparatively to solution 3. It seems to me that in solution 3, as well, the meta estimators only provides what the childrens need/request:

Each meta-estimator is given a routing specification which it must follow in passing only the required parameters to each of its children

6.

For estimators to be cloned, this request information needs to be cloned with it

I'm not sure I understand this sentence. Seems like the grammar may be wrong?

7.

get_props_request will return a dict

It is said prior that get_props_request would return a list (and possibly a dict)

8.

One thing that isn't clear to me is the behaviour of solution 4., outside of the use of meta-estimators.

For example, what does this do:

lr = LogisticRegression().set_props_request(['sample_weights'])
lr.fit(X, y, sample_weights=None)  # ?

lr = LogisticRegression().set_props_request([])
lr.fit(X, y, sample_weights=sample_weights)  # ?
  1. All examples seem to illustrate how cross_validate is used, but none of them illustrate what a call to fit() may look like. I think this may be useful as well (and might resolve .8 for me)

**kw syntax will be used to pass props by key.

In the examples of solution 4., **kw are never used. Should they? Or does this refer to a call to fit?

  1. Other stuff
jnothman commented 4 years ago

Thanks for these comments @NicolasHug. I've finally read them, but I will have to find time to propose improvements in response.

The main substantial query is 8, I think. The props request only affects the use in meta-estimators. Is that a problem?

NicolasHug commented 4 years ago

The props request only affects the use in meta-estimators. Is that a problem?

OK I missed that somehow, thanks for the clarification

jnothman commented 3 years ago

Hmm... This got lost. I think there were some working group meetings that covered related topics and worked to clarify the slep since then

adrinjalali commented 2 years ago

Since we've had the workgroup to discuss the SLEP, we've added many examples and @thomasjpfan has re-written the SLEP. I'm not sure if these questions still stand.

NicolasHug commented 2 years ago

Yes, these are likely outdated by now. Closing.