Closed adrinjalali closed 5 years ago
Awesome, thanks for opening this! I think you should also address what which arguments we want to make keyword only, or at least raise the question. Is this going to be for all parameters in estimators, or some? For only estimators, or also for functions? Which functions? For methods? (fit params?)
Also added a scope section.
anything else left here before we can put it out for a vote?
@amueller I think this is in a much better state now.
@adrinjalali thanks! You work to fast, I can't keep up ;) I'll try to read your feature names first...
Thanks for putting this proposal together!
I feel maybe estimator, X, y could be positional and the rest keyword only?
I think one major risk with this that needs to be evaluated is the effort required of user to fix the warnings as compared to the perceived benefit (which connects to the idea in https://github.com/scikit-learn/enhancement_proposals/pull/19#discussion_r306574661 that we need better analysis of the current use of positional arguments). For passing numerous parameters the motivation is clear. When only the first or second parameter is passed prepositionally, it is less so.
For instance, I can see myself using NearestNeighbours().radius_neighbors(X, radius)
(to take an arbitrary example) by passing radius
positionally . For that matter it's often used that way in scikit-learn code base,
There is indeed a risk that position of the radius
argument changes, but in practice it's quite unlikely.
Maybe we could adds some of these concerns under the "Consideration" section?
@rth I wouldn't mind allowing that to be positional. The question is how to formulate a clear rule.
There was the Odyssey project which might help with code analysis:
https://odyssey.readthedocs.io/en/latest/tutorial.html
Also interesting might be
https://github.com/Quansight-Labs/python-api-inspect
Someone that's better at SQL than me might be able to write the correct query example
Lgtm.com should also be possible for querying use within this library
If we agree that the decision can be made case by case, then we don't have to make those decisions now.
I tried to formulate that in a better and a bit more deterministic way, let me know how it looks now.
So the next step is an email on the mailing list, and call for a vote?
I'm not sure if doing a case-by-case is good. I feel like that would lead to a lot of bike-shedding, slow progress, and inconsistencies.
I'm happy with separating the exact details of scoping it from the question of doing it, though.
I'm just trying to find something that it has a chance of passing the vote. Otherwise I'd propose everything to be keyword only except X
and y
!
Well I don't think that's good for GridSearchCV
or cross_validate
. I think estimator
should be possible positionally.
So the next step is an email on the mailing list, and call for a vote?
Before the vote I think we might want to get more feedback from users and other core devs? How is it expected to work: we merge this and then discuss on the mailing list, or announce on the mailing list then wait for feedback here then vote on the mailing list?
The https://scikit-learn.org/stable/governance.html#enhancement-proposals-sleps page is not very verbose about it, and the figure with all the possible CPython PEP statuses kind of lost me :)
Well I think the default procedure is not a vote but consensus seeking, right? So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees. The vote was meant to put a deadline on discussions. If that's what you want to do then you could ask for a vote now.
Having the SLEP be merged might be nice for visibility and readability, but I feel it's easier to comment on it in a PR. I don't think we formalized the process of this yet and I think we can see what works best. Right now I feel keeping it a PR might be more conducive for discussion. How would someone comment on a SLEP after it's merged? We could have an issue for discussion but that makes it harder to comment on specific sections or to suggest changes.
So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees.
Sounds good! I agree it's easier to keep the PR open to comment on it.
I agree that it's beneficial, but seems that this will need lots of work and I'm wondering whether it's worthwhile (Maybe we have many more important things to do).
@amueller how do we proceed from here? I think we're making this harder than it has to be.
we were thinking of merging this PR once the conversation is completely over, that's why it hasn't been merged yet. I'll see if I can fix your permissions here (not sure why you don't have them)
I think it's okay to merge with a couple of acceptances and then use another pr to propose changing its status to accepted, and to admit further tweaks.
Ok, then if there's not -1 votes, we can merge and announce on the mailing list? I think we only need to vote if there's at least one person objecting the change, otherwise we can go ahead with @thomasjpfan 's plan
@adrinjalali I think there should be at least a sentence on how we will decide which parameters will be positional only. Will there be another slep? Will it be ad-hoc decisions (I don't like this). Will it be based on data? Will there be some discussion for a "rule" without writing another SLEP?
I don't think we're making this harder as it has to be. I think a SLEP should provide a clear path forward. If this is accepted, I have no idea what would happen next.
ok happy to go ahead. Should we merge to facilitate viewing or not merge to facilitate commenting? ;) I think I'm +.5 for merge.
Let's do it then :)
This is to formalize our discussion around introducing keyword only arguments.
It follows https://github.com/scikit-learn/scikit-learn/issues/12805 and https://github.com/scikit-learn/scikit-learn/pull/13311
It should be an easy one, since the solution is already there, and it seems more like a matter of making a decision.
@scikit-learn/core-devs