scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
48 stars 34 forks source link

SLEP009: keyword only arguments #19

Closed adrinjalali closed 5 years ago

adrinjalali commented 5 years ago

This is to formalize our discussion around introducing keyword only arguments.

It follows https://github.com/scikit-learn/scikit-learn/issues/12805 and https://github.com/scikit-learn/scikit-learn/pull/13311

It should be an easy one, since the solution is already there, and it seems more like a matter of making a decision.

@scikit-learn/core-devs

amueller commented 5 years ago

Awesome, thanks for opening this! I think you should also address what which arguments we want to make keyword only, or at least raise the question. Is this going to be for all parameters in estimators, or some? For only estimators, or also for functions? Which functions? For methods? (fit params?)

adrinjalali commented 5 years ago

Also added a scope section.

adrinjalali commented 5 years ago

anything else left here before we can put it out for a vote?

adrinjalali commented 5 years ago

@amueller I think this is in a much better state now.

amueller commented 5 years ago

@adrinjalali thanks! You work to fast, I can't keep up ;) I'll try to read your feature names first...

rth commented 5 years ago

Thanks for putting this proposal together!

I feel maybe estimator, X, y could be positional and the rest keyword only?

I think one major risk with this that needs to be evaluated is the effort required of user to fix the warnings as compared to the perceived benefit (which connects to the idea in https://github.com/scikit-learn/enhancement_proposals/pull/19#discussion_r306574661 that we need better analysis of the current use of positional arguments). For passing numerous parameters the motivation is clear. When only the first or second parameter is passed prepositionally, it is less so.

For instance, I can see myself using NearestNeighbours().radius_neighbors(X, radius) (to take an arbitrary example) by passing radius positionally . For that matter it's often used that way in scikit-learn code base,

``` $ rg -A1 "radius_neighbors\(" sklearn --iglob '!test_*' sklearn/cluster/mean_shift_.py 94: i_nbrs = nbrs.radius_neighbors([my_mean], bandwidth, 95- return_distance=False)[0] -- 230: neighbor_idxs = nbrs.radius_neighbors([center], 231- return_distance=False)[0] sklearn/cluster/optics_.py 502: indices = nbrs.radius_neighbors(P, radius=max_eps, 503- return_distance=False)[0] sklearn/feature_selection/mutual_info_.py 65: ind = nn.radius_neighbors(radius=radius, return_distance=False) 66- nx = np.array([i.size for i in ind]) -- 69: ind = nn.radius_neighbors(radius=radius, return_distance=False) 70- ny = np.array([i.size for i in ind]) -- 139: ind = nn.radius_neighbors(radius=radius, return_distance=False) 140- m_all = np.array([i.size for i in ind]) sklearn/neighbors/base.py 622: def radius_neighbors(self, X=None, radius=None, return_distance=True): 623- """Finds the neighbors within a given radius of a point or points. -- 670: >>> rng = neigh.radius_neighbors([[1., 1., 1.]]) 671- >>> print(np.asarray(rng[0][0])) -- 835: A_ind = self.radius_neighbors(X, radius, 836- return_distance=False) -- 839: dist, A_ind = self.radius_neighbors(X, radius, 840- return_distance=True) sklearn/neighbors/classification.py 356: neigh_dist, neigh_ind = self.radius_neighbors(X) 357- inliers = [i for i, nind in enumerate(neigh_ind) if len(nind) != 0] sklearn/neighbors/unsupervised.py 96: >>> nbrs = neigh.radius_neighbors([[0, 0, 1.3]], 0.4, return_distance=False) 97- >>> np.asarray(nbrs[0][0]) sklearn/neighbors/regression.py 303: neigh_dist, neigh_ind = self.radius_neighbors(X) 304- sklearn/cluster/dbscan_.py 174: neighborhoods = neighbors_model.radius_neighbors(X, eps, 175- return_distance=False) ```

There is indeed a risk that position of the radius argument changes, but in practice it's quite unlikely.

Maybe we could adds some of these concerns under the "Consideration" section?

amueller commented 5 years ago

@rth I wouldn't mind allowing that to be positional. The question is how to formulate a clear rule.

There was the Odyssey project which might help with code analysis:

https://odyssey.readthedocs.io/en/latest/tutorial.html

Also interesting might be

https://github.com/Quansight-Labs/python-api-inspect

Someone that's better at SQL than me might be able to write the correct query example

jnothman commented 5 years ago

Lgtm.com should also be possible for querying use within this library

adrinjalali commented 5 years ago

If we agree that the decision can be made case by case, then we don't have to make those decisions now.

I tried to formulate that in a better and a bit more deterministic way, let me know how it looks now.

adrinjalali commented 5 years ago

So the next step is an email on the mailing list, and call for a vote?

amueller commented 5 years ago

I'm not sure if doing a case-by-case is good. I feel like that would lead to a lot of bike-shedding, slow progress, and inconsistencies.

amueller commented 5 years ago

I'm happy with separating the exact details of scoping it from the question of doing it, though.

adrinjalali commented 5 years ago

I'm just trying to find something that it has a chance of passing the vote. Otherwise I'd propose everything to be keyword only except X and y!

amueller commented 5 years ago

Well I don't think that's good for GridSearchCV or cross_validate. I think estimator should be possible positionally.

rth commented 5 years ago

So the next step is an email on the mailing list, and call for a vote?

Before the vote I think we might want to get more feedback from users and other core devs? How is it expected to work: we merge this and then discuss on the mailing list, or announce on the mailing list then wait for feedback here then vote on the mailing list?

The https://scikit-learn.org/stable/governance.html#enhancement-proposals-sleps page is not very verbose about it, and the figure with all the possible CPython PEP statuses kind of lost me :)

amueller commented 5 years ago

Well I think the default procedure is not a vote but consensus seeking, right? So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees. The vote was meant to put a deadline on discussions. If that's what you want to do then you could ask for a vote now.

Having the SLEP be merged might be nice for visibility and readability, but I feel it's easier to comment on it in a PR. I don't think we formalized the process of this yet and I think we can see what works best. Right now I feel keeping it a PR might be more conducive for discussion. How would someone comment on a SLEP after it's merged? We could have an issue for discussion but that makes it harder to comment on specific sections or to suggest changes.

rth commented 5 years ago

So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees.

Sounds good! I agree it's easier to keep the PR open to comment on it.

qinhanmin2014 commented 5 years ago

I agree that it's beneficial, but seems that this will need lots of work and I'm wondering whether it's worthwhile (Maybe we have many more important things to do).

adrinjalali commented 5 years ago

@amueller how do we proceed from here? I think we're making this harder than it has to be.

adrinjalali commented 5 years ago

we were thinking of merging this PR once the conversation is completely over, that's why it hasn't been merged yet. I'll see if I can fix your permissions here (not sure why you don't have them)

jnothman commented 5 years ago

I think it's okay to merge with a couple of acceptances and then use another pr to propose changing its status to accepted, and to admit further tweaks.

adrinjalali commented 5 years ago

Ok, then if there's not -1 votes, we can merge and announce on the mailing list? I think we only need to vote if there's at least one person objecting the change, otherwise we can go ahead with @thomasjpfan 's plan

amueller commented 5 years ago

@adrinjalali I think there should be at least a sentence on how we will decide which parameters will be positional only. Will there be another slep? Will it be ad-hoc decisions (I don't like this). Will it be based on data? Will there be some discussion for a "rule" without writing another SLEP?

I don't think we're making this harder as it has to be. I think a SLEP should provide a clear path forward. If this is accepted, I have no idea what would happen next.

amueller commented 5 years ago

ok happy to go ahead. Should we merge to facilitate viewing or not merge to facilitate commenting? ;) I think I'm +.5 for merge.

adrinjalali commented 5 years ago

Let's do it then :)