scikit-adaptation / skada

Domain adaptation toolbox compatible with scikit-learn and pytorch
https://scikit-adaptation.github.io/
BSD 3-Clause "New" or "Revised" License
56 stars 16 forks source link

Implement Kernel Mean Matching #82

Closed antoinedemathelin closed 4 months ago

antoinedemathelin commented 5 months ago

Hi everyone, I am glad to see that the skada repo is already public. I really like the API choices that have been made (the pipeline idea is great!) I am opening this issue to propose the implementation of the Kernel Mean Matching reweighting method (cf. “Correcting sample selection bias by unlabeled data.” paper.

Are you ok to add it to the library ? If yes, I can open a PR.

rflamary commented 5 months ago

Hello @antoinedemathelin and welcome back to skada !. You have already coded many methods that are of interest to us.

This is a very good idea, KMM is a reweighting isn't it? you could do it with a specific Adapter that outputs sample_weights ;)

antoinedemathelin commented 5 months ago

Thank you @rflamary ! Yes it's a reweighting algorithm. From what I understand, I should write a KMMAdapter object, implementing an adapt method which returns an AdaptationOutput instance containing the sample_weights of the reweighting. If I am right, the inputs arrays X, y, now contain both source and target samples, so the sample_weights array should contain the source importance weights for the source indexes and 0 for the target indexes ? I will follow the KLIEP skada implementation, the main difference with KMM is the optimization part, the "API part" should be the same, I guess.

I guess, I need to write a small example and tests for the method ?

tgnassou commented 5 months ago

Exactly! If you have any problem, just ask us. In the example don't hesitate to show how works the method with nice plots ;)

And don't hesitate to give us some feedback on the API, we are still converging to the final API