scikit-adaptation / skada

Domain adaptation toolbox compatible with scikit-learn and pytorch
https://scikit-adaptation.github.io/
BSD 3-Clause "New" or "Revised" License
56 stars 16 forks source link

Weights normalization in ReweightDensity #112

Closed antoinedemathelin closed 4 months ago

antoinedemathelin commented 5 months ago

Hi everyone, I wonder why the weight are normalized in ReweightDensity cf https://github.com/scikit-adaptation/skada/blob/9e7905ddbbc2823eb3f48cb08e6cd5278bd2385a/skada/_reweight.py#L108 ?

After some tests, I observed that the strange behavior of ReweightDensity in the comparison example is due to the weight normalization. cf. https://scikit-adaptation.github.io/auto_examples/plot_method_comparison.html#sphx-glr-auto-examples-plot-method-comparison-py

Maybe, we should remove the normalization or at least divide by the mean instead of the sum (dividing by the sum provokes a dependance of the weights with respect to the dataset size, which should be avoided, I think).

BuenoRuben commented 5 months ago

You are right, when comparing the weights obtained with this methods, with those from the others, it is the only one that is normalized in such a way (all other reweighting methods have some weights bigger than 1), so it might be better not to normalize it

kachayev commented 5 months ago

From the code, it seems like normalization here is specifically to produce density. Right? DiscriminatorReweightDensityAdapter also returns density (normalized), it just does so by calling predict_proba on the classifier.

BuenoRuben commented 5 months ago

maybe it is a mistake, and we shouldn't give a density in both cases: those are the two worst performing reweighting method, being worst on the example than when we are not using da (see: https://output.circle-artifacts.com/output/job/8955b99a-a58c-4b5c-afb0-8bcb4c545245/artifacts/0/dev/auto_examples/plot_reweighting.html#sphx-glr-auto-examples-plot-reweighting-py ) also I think weights are not supposed to be density, those are just supposed to be the quotient of two probabilities, and thus can be highter than 1, and have a sum highter than 1

antoinedemathelin commented 5 months ago

Hi, In fact the optimal weights should be equal to the density ratio w(x) = pt(x) / ps(x). So, when the number of source data is large, the weights should approximately average to one (and not sum to one).

Moreover, from a scikit-learn implementation perspective, uniform weighting is equivalent to all weight being equal to one, I think (i.e. the mean of the weights should be equal to one). Otherwise, I think you may fall in numerical issue, when the number of source data is large (your weights become close to zeros).

From my point of view, there is also an issue with the current implementation of DiscriminatorReweightDensityAdapter, I think the weights should be computed with the formula: (1-predicted_proba) / predicted_proba instead of predicted_proba. Because the optimal discriminator verifies: predicted_proba = ps / (ps + pt) = (1/w + 1)^-1 cf GAN paper

rflamary commented 5 months ago

I agree with you that this reweighting with just probabilities is not correct or the one from the papers (I would cite the review of sugiyama instead of GAn though ;)). OK for weights that sum to n instead of 1 because indeed it changes the proper parameters (C or alpha) in the sklearn models. I juts had the problem when implementing the JDOT classifier.

rflamary commented 5 months ago

Also about the Reweight adapter we should maybe rename them without density everywhere; density is used in the model that use a density estimator, but it would make sens to remove teh desnsity for the Gaussian one (it is implied in a way) and the discriminator one. We did not use density for KMM either. As log as Reweight is in the same it is enough I think.

antoinedemathelin commented 5 months ago

Ok thank you @rflamary, I will open a PR to correct these few issues

rflamary commented 4 months ago

closing this usse fixed in #118