scikit-adaptation / skada

Domain adaptation toolbox compatible with scikit-learn and pytorch
https://scikit-adaptation.github.io/
BSD 3-Clause "New" or "Revised" License
56 stars 16 forks source link

[API] fit, predict and other functions should accept X,y,smaple_weight as first parameters by default #88

Open rflamary opened 5 months ago

rflamary commented 5 months ago

In order to stay compatible with skleranAPI we want all estimatoer (and Adapters) to have functions that accept unnamed sample_weightby defalult.

For the moment we have for Adapter and DAEstimator:

    @abstractmethod
    def fit(self, X, y=None, sample_domain=None, *, sample_weight=None):
        """Fit adaptation parameters"""

I suggest we change it to

    @abstractmethod
    def fit(self, X, y=None, sample_weight=None sample_domain=None, *):
        """Fit adaptation parameters"""

that will allow to do things like

   model.fit( X, y, sample_weight, sample_domain):

the named sample_domain will still be necessary for pipelines

rtavenar commented 5 months ago

Hi,

I'd like to start contributing to skada and it seems this issue could be a good start. Is there a consensus on @rflamary 's suggestion? If yes, is it OK if I start a PR on this?

kachayev commented 5 months ago

Let's make 'sample_domain' parameter named-only (as it previously was). I know recall that the change was made to have ability to call

fit(X, y, sample_domain)

(no names, just ordering). If we put 'sample_weight' third, the ability to do

fit(X, y, None, sample_domain)

would be frustrating. To avoid this, we just need to declare

    @abstractmethod
    def fit(self, X, y=None, sample_weight=None, *, sample_domain=None, **params):
        """Fit adaptation parameters"""

This will force 'sample_domain' to be always named, removing the risk of putting it instead of weights (technically, we won't be able to spot the problem as we allow 'sample_domain' to be given as None).

WDYT?