Closed antoinedemathelin closed 5 months ago
I see that the key for the sample_weight
arguments is "sample_weights" in the dictionary returned by KLIEPAdapter, while it is "sample_weight", without "s", in the estimator fit
method. It may explain the issue, but I am not sure...
There are 3 issues here 😃
sample_weights
, as you rightfully mentionedLinearRegression
know that it needs sample_weight
Like this one
KLIEP(base_estimator=LinearRegression().set_fit_request(sample_weight=True), gamma=1.)
There's a related task in the list (quite an old one) to throw an error when the adapter returns weights but the downstream estimator does not accepts them.
P.S. Nice catch! And kudos for finding out how to pass weights manually.
Seems to me that the set_fit_request
should be done inside the KLIEP class/function so that we can raise an error if wample_weight is no available?
In a such a case, how would we know if the estimator accepts weights in the first place?
well we can look at the fit function definition? This would be important because i'm pretty sure the error with weights are not available is critpic and we could detect that before fitting it would be nice no?
the error with weights are not available is critpic
Which one?
can look at the fit function definition
Metadata routing does this for us (it tracks signatures as far as I recall). 'sample_weight' is an odd edge case as it's defined in many places in the signature just because older versions of sklearn were designed so (before routing was introduced). I need to check sklearn code to see how widely this happens.
Oh, and KLIEPAdapter cannot turn on 'sample_weight' for the downstream estimator in the pipeline as those are (separate) pipeline steps, they cannot (and should not by design) alter other steps. The decision to pass or not to pass weights is done in the Selector which analyses routing to see if estimator requested weights.
I am not sure to understand every message of the discussion. I made a suggestion to perform reweighting with estimators that don't use sample_weight
, see #7
There is also a question about how to handle the sample_weight
argument given by the user in the fit
method of the pipeline (perhaps it is the point of your last message ?). My guess, is that this sample_weight
vector should be used in the reweighting (all reweighting techniques can be written to handle sample_weight) and the new sample_weight
(returned by the weighting method) should be given to the estimator.
But it is not an easy question... You can also combine both sample_weight
(by multiplying them?). I prefer the first solution, because the meaning of giving a sample weight to the pipeline is to reweight the distribution. It makes sense, then, to perform the reweighting on the reweighted distributions. (I don't know, if it's very clear...)
My guess, is that this sample_weight vector should be used in the reweighting (all reweighting techniques can be written to handle sample_weight) and the new sample_weight (returned by the weighting method) should be given to the estimator.
This is how it is implemented right now. If the adapter yields sample_weight
we replace those that we given in the original call to the pipeline. This is somewhat uncommon for a sklearn pipeline, which handles only X
samples to be updated (i.e. transformed). That's the reason we have special time of output (AdaptationOutput
) and the code in selectors to process it throughout the rest of steps.
I guess, if there's need to have a flexibility, like multiplying different weights, - it should be done in a custom estimator class that combines adaptation and fit/predict (like JDOT does). So we keep the logic of adapter/estimator pipelines straightforward.
This one is fixed no? I see many .set_fit_request(sample_weight=True)
everywhere ;)
Yeah, it has being fixed already.
Hi everyone,
I think there is an issue with KLIEP. It provides the same results as the estimator fitted without
sample_weight
. It seems that thesample_weight
returned by KLIEPAdapter are not given as argument of thefit
method of the estimator. Here is a little example showing that KLIEP provides the same predictions as the estimator without weighting, while it should provide the same results as the estimator fitted with the weight returned by KLIEPAdapter:PS : I think, the issue is not KLIEP specific but is encountered by all reweighting classes.