Closed brandongreenwell-8451 closed 1 year ago
@paulbkoch it was just easier to start afresh and create a new PR (I'm also not the most Git/GH savvy of statisticians...). This version places the remove_nil_terms
argument over to the reweight_terms()
method and internally calls remove_terms()
when set to True
.
Note: This works great on my end, but not sure if the return statement on line 2267 is the most Pythonic way to handle the nested call to remove_terms()
😎.
Hi @brandongreenwell-8451 -- We definitely need some kind of remove_terms function that only operates on the terms given to it by the caller. The eventual ISLE function does not need this, but EBMs do as a general ability, so I figure it's best to have it as a primitive that is called by reweight.
Another small nuance: I think the reweigh_terms function should only remove terms that have 0 in the given weight parameter, not check if any of the resulting term scores are all zero. If some of the terms had all zeros in their scores before the call to reweigh_terms, then the caller would need to do a lot of checks to know which terms to expect will be dropped, and that behavior would be surprising, I think. Safer to force the caller to be more explicit in this case and call remove_terms themselves in the special case that there are terms with all zeros and they want them removed. It's pretty unlikely we'd see a term with all zeros naturally anyways, except if the user previously edited the EBM (in which case whatever they called to do that could also have a remove_nil_terms parameter), or in the case of a feature with only 1 value. In the latter case LASSO would give it a weight of 0 anyways I think, so those terms would be dropped anyways in an ISLE implementation.
@paulbkochms I think we're aligned on your first comment in this PR, right? Copy on the second and will send an update shortly!
Edit: PR updated and calling reweight_terms()
with remove_nil_terms=True
will now only remove terms with a corresponding weight of zero (as opposed to check zero term importances).
And the DCO check passed...🥳
Thanks @brandongreenwell-8451 -- Looks pretty good! I noted a few small things to change. Also, please pip install the black formatter, and run "black ." on the command line to pass the final black formatting check in the build.
Also, could you add a unit test for each of these in:
I had an additional thought over the weekend regarding naming. I think for someone new to InterpretML who has not read the documentation, reweigh_terms might be misinterpreted as weighing/measuring something within the EBM rather than something that applies a multiplication factor. We also use "weight" inside the EBM for the binweights attribute which might lead someone to guess they are related. The alternative name I came up with was:
def scale_terms(self, factors, remove_nil_terms=False):
I had an additional thought over the weekend regarding naming. I think for someone new to InterpretML who has not read the documentation, reweigh_terms might be misinterpreted as weighing/measuring something within the EBM rather than something that applies a multiplication factor. We also use "weight" inside the EBM for the binweights attribute which might lead someone to guess they are related. The alternative name I came up with was:
def scale_terms(self, factors, remove_nil_terms=False):
That seems reasonable to me! I'll make that change later this week when I work on your other suggestions.
@paulbkoch I think this addresses all your concerns/suggestions if you want to take another look.
Looks great! Thanks for contributing this @brandongreenwell-8451
Awesome @paulbkoch, thanks for the support I’m getting this through!
This PR addresses https://github.com/interpretml/interpret/issues/460 by adding two new methods to a fitted EBM object:
reweight_terms()
- which allows callers to reweight the individual term contributions of the fit;remove_terms()
- which allows callers to remove specific terms (and their associated components).