Closed amueller closed 4 years ago
cc @glemaitre @NicolasHug I guess?
It seems orthogonal to the work
Could you be more specific? is it that the parameter itself is more related to some scikit-learn goodies instead of some specific internal estimator machinery?
callbacks for formatting feature names
I think it would be a neat proposal. Having a callbacks
parameter to which you could pass a list of callback (verbose, feature_names, early_stopping, etc.) would be something that I would like. However, going directly in this direction, I am scared that we delay the support for feature names.
Maybe the trade-off here would be to have something that we can easily remove in the future for a more generic callback solution. Thus, this is true that adding an estimator parameter will be annoying once we want to get rid of. Overwriting a method is maybe a better option.
I have suggested using formats, e.g. "log({input_name})"
. That can be an attribute.
@NicolasHug could you provide a minimal example for a class. I am not sure how it is looking like.
On Mon, 17 Feb 2020 at 14:04, Nicolas Hug notifications@github.com wrote:
I have suggested using formats, e.g. "log({input_name})". That can be an attribute.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/issues/32?email_source=notifications&email_token=ABY32P7FGU73DNHAMSQMRLLRDKDOVA5CNFSM4KVRKJ3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL6KVYI#issuecomment-586984161, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY32PZ66KA2YOVJR3UNEFTRDKDOVANCNFSM4KVRKJ3A .
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
OK but you still need to pass the formatting to the class in case you want to tune it, isn't it?
On Mon, 17 Feb 2020 at 15:05, Nicolas Hug notifications@github.com wrote:
https://github.com/scikit-learn/enhancement_proposals/pull/17/files#r378916622
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/enhancement_proposals/issues/32?email_source=notifications&email_token=ABY32P6CDKMA5UBDGMB2S2TRDKKRXA5CNFSM4KVRKJ3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL6QZJY#issuecomment-587009191, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY32P55M2L6YMRDHJP3IULRDKKRXANCNFSM4KVRKJ3A .
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Yes, I just think it's a simpler alternative to a method
Basically right now SLEP 007 suggests to add a constructor parameter to all transformers that are not feature selectors, right?
not really. My intention was to suggest the possibility of having such a parameter to transformers, wherever we think there's demand for more than one way of feature generation. Is the concern that we'll need to add the parameter to many estimators, or that we don't want to have such a constructor parameter at all?
However, going directly in this direction, I am scared that we delay the support for feature names.
To me this is the one contentious part of the SLEP.
I think we can leave those as enhancements and leave them for future PRs, which won't even need a SLEP. My idea is that the SLEP can go ahead with a True/False verbosity level, and the rest can be considered future backward compatible enhancements.
Sorry my phone seems to have given me grief and cut off a sentence. What I meant to say was:
It seems orthogonal to the working of the estimator.
In other words most of the parameters (all but verbose
really) influence what the model does. This one would not. That's why I'm hesitant to add it to the constructor. And yes, my concern is that in my understanding of the SLEP we would have to add it in many many places.
not really. My intention was to suggest the possibility of having such a parameter to transformers, wherever we think there's demand for more than one way of feature generation.
From the SLEP it sounded to me like that was an intrinsic aspect of the feature name API. If I misunderstood that, then maybe it should be clarified. So you're suggesting that we figure out whether to add this parameter or not on a case-by-case basis? How do we decide where we add it and what the default is?
I'm happy to do a simple solution that we can easily implement. I think my initial thought was not to change the feature names in any of the one-to-one transformers and that's what I had implemented, so it would be verbose_feature_names=False
. If we want to implement that and delay further additions to the future I'm fine with that.
Right now I read the SLEP as adding a constructor argument to a large number of estimators and I was skeptical about that.
How do we decide where we add it and what the default is?
If the implementation by the implementer w/o the parameter causes a discussion in the review process, we'd add the parameter to make everybody happy. At least that's what I thought.
I'm happy to do a simple solution that we can easily implement. I think my initial thought was not to change the feature names in any of the one-to-one transformers and that's what I had implemented, so it would be verbose_feature_names=False. If we want to implement that and delay further additions to the future I'm fine with that.
I'd also be very much happy about that.
Would you be happy rephrasing it as in #38 ?
I like the idea of a format_feature_name method that can be overridden. What variables to provide it is one of several messy questions in this approach
I like the idea of a format_feature_name method that can be overridden. What variables to provide it is one of several messy questions in this approach
Wouldn't that mean the user should subclass an estimator to be able to change the behavior? Seems a bit too cumbersome.
I think I'd be happy with methods which set those flags for the estimator, and not have them as an __init__
param. Same with the approach in the sample props proposal/PR. But that means we'd need to have a better way of handling such private attributes, and have a better clone
which can handle those cases systematically.
The users could just overwrite a method by assignment and make everyone with an OO background cringe ;)
Any of these seem like we need to introduce a new mechanism to sklearn though. I'm not opposed to adding a method that allows setting these kinds of parameters (in D3M they are called control parameters, I think). An alternative would be to have a single constructor argument that encapsulates anything that is not related to estimator behavior that will come up in the future and have it be a dict. Arguably the verbosity would be one of those, though.
lr = LogisticRegression(C=0.1, settings={'logging_callback': somefunction,
'feature_name_func': anotherfunction, 'return_dataframe': False})
Unclear if that's better or worse than having a lr.set_config(...)
method, but a constructor argument dict wouldn't require rewriting clone
. Though clone
is arguably pretty weird.
Arguably the verbosity would be one of those, though
verbose
, but also n_jobs
, pre-dispatch
, cache_size
, memory
...
I like the idea of distinguishing parameters that have an effect on the predictions (in other words, those that one might want to search over), and the rest.
@NicolasHug this is basically what D3M does.
Though it's not super easy to define the distinction. "want to search over" is probably not a good one, as you probably don't want to search over tol
but the choice of tol
still influences the results.
Everything that you mention doesn't change the result, so you could have "numerically the same" be the definition. I guess I included return_dataframe
which would drastically change the result.
Another way to think about it is parameters that are there for all or most models vs those that are model specific.
Unclear if that's better or worse than having a
lr.set_config(...)
method, but a constructor argument dict wouldn't require rewritingclone
. Thoughclone
is arguably pretty weird.
I'm trying to think how it'll all look, including the docs. We could, for instance, have it as an __init__
parameter, but also have the set_config
or something with a Bunch
for convenience. This would also remove the need to change clone
.
Then the question is, where do we document these possible arguments? In the init, or in that method, or both? In my mind it'd be nicer to have it only in the method's doc, and not all the info in the init's.
But this still makes the init signature pretty long.
I'd be fine with just adding it in the set_config
method (which would mostly be syntactic sugar and a way to surface these settings).
It's still unclear to me what the scope would be, though.
It kinda deserves it's own SLEP though :/
Doest #38 close this then?
The users could just overwrite a method by assignment and make everyone with an OO background cringe ;)
Won't work with cloning...
I think "fixing" this would require it's own SLEP but I think we're good to move forward for now.
Basically right now SLEP 007 suggests to add a constructor parameter to all transformers that are not feature selectors, right?
I see the motivation and I think it's actually hard to come up with a better solution (I don't have one right now), but I'm also concerned with adding feature name specific things to the constructor. It seems orthogonal to the working of the estimator.
I think the alternative that @adrinjalali suggested was having callbacks for formatting feature names (I don't remember the details tbh), but that was pretty complex.
Maybe we could have a method that people could overwrite or something like that? I'm not sure. To me this is the one contentious part of the SLEP.