RFC: split of kwargs between __init__ and fit in the FedECA object

jeandut commented 8 months ago

The way the FedECA API was created was to follow the fammous sklearn fit API with the constraints given by Hydra ,the xp manager, namely that the same fedeca object needs to be sequentially fitted with different fit/config args which should lead to different behaviors according to the experimental plan described in the paper, which for now vary:

n_samples and other data-related hyper-parameters such as the number of ties or covariate shifts or number of federated clients or repartition of data between them (and substra handles if the dataset exists)
DP parameters for instance So given such constraints coming from hydra, fit_kwargs has to contain at least roughly {"data_kwargs": {"n_samples": 1000 , "shifts": 0.4, "cate": 0.2, "propensity": etc. }, "n_clients": 10, "substra_kwargs": {"dataset_hash", }}.

The problem is about the init kwargs (init_kwargs) and its relation with the fit kwargs (fit_kwargs):

The init can have either its own kwargs (therefore there is a complete separation of fit_kwargs and init_kwargs) OR its own kwargs and some of the fit kwargs (all or a subset) OR no kwargs at all OR all the same kwargs. The complete separation is cleaner however it means nothing much happens in the init and indeed, what should be happening in the init ? Maybe the init should have no kwargs and do nothing or almost nothing => careful there is inheritance of BaseEstimator and BootstrapMixin TODO: add this constraint.

Having overlap means handling the case where you have an init with certain kwargs and then fit with other kwargs which might change or not the value of the previous kwargs. This is what's happening right now in the code. The current logic is awful because it tries to detect which params have changed, in a very brittle way if for instance the user instantiated FedECA without DP and now wants to fit it with DP it has to detect it and reinstantiate the strategies attribute (the reverse case could also happen). Bugs have previously been found by @honghaoli42 and by me in this logic and I would not be surprised if there were some more.

Handling more graciously this type of logic would necessitate the use of dedicated set_attributes_with_kwargs methods with default kwargs at None which would detect non-None hyperparameters and reinitialize only such kwargs OR with exactly the same default as the init and setting all kwargs no matter their values. This method could or not use dedicated object such as Enums. This is not done today.

Considering sklearn, surprisingly it seems there is no clear separation and some kind of the same overlapping logic is implemented see i.e. the fit method of the BaseSGD class: https://github.com/scikit-learn/scikit-learn/blob/f07e0138b/sklearn/linear_model/_stochastic_gradient.py#L930

def _fit(
        self,
        X,
        y,
        alpha,
        C,
        loss,
        learning_rate,
        coef_init=None,
        intercept_init=None,
        sample_weight=None,
    ):

def __init__(
        self,
        loss="hinge",
        *,
        penalty="l2",
        alpha=0.0001,
        l1_ratio=0.15,
        fit_intercept=True,
        max_iter=1000,
        tol=1e-3,
        shuffle=True,
        verbose=0,
        epsilon=DEFAULT_EPSILON,
        n_jobs=None,
        random_state=None,
        learning_rate="optimal",
        eta0=0.0,
        power_t=0.5,
        early_stopping=False,
        validation_fraction=0.1,
        n_iter_no_change=5,
        class_weight=None,
        warm_start=False,
        average=False,
    ):

See i.e. alpha, loss, learning_rate, ... Delving into the fit logic you can see the same, kind of non-intuitive (very subjectively) patterns: if you didn't pass a coef_init it is thus None and so the object should use its attribute: self.coef_

        if self.warm_start and hasattr(self, "coef_"):
            if coef_init is None:
                coef_init = self.coef_
            if intercept_init is None:
                intercept_init = self.intercept_

What should we do ? First for kwargs separation (or not) and then if there is overlap what would be a pseudocode for a cool logic ?

In my opinion, seeing that sklearn did go with some overlap I guess having all kwargs overlap for now seems like it could be a good first step. Regarding the associated logic I would go for a set_attributes_with_kwargs which would be called by both init AND fit methods with all its kwargs at None and would iterate on its kwargs and detect non-None kwargs and set them immediately:

class FedECA():
    def __init__(self, *args, **kwargs):
        self.set_attributes_with_kwargs(*args, **kwargs)

    def fit(self, *args, *kwargs):
        self.set_attributes_with_kwargs(*args, **kwargs):

    def set_attributes_with_kwargs(att1=None, att2=None, ...):
        # here list_of_att is hardcoded maybe it can be done dynamically with inspect to prevent devs from having to update this list whenever
        # adding a new kwargs ?
        for att_name in list_of_att:
            att = eval(att_name)
            if  att is not None:
                self.setatttr(att_name, att)
         # all logic to prepare the training as well such as instantiating strategies

Once we reach a decision whatever it is it should be relatively quick to implement and would prevent much bugs in the future.

ghost commented 8 months ago

Thanks @jeandut for sharing your thoughts on this tricky topic. What you propose makes sense, but it adds boilerplate code (basically getters and setters everywhere). I probably lack context, but why couldn't we go for a simple separation with everything in the init_kwargs just the data passed to fit?

jeandut commented 8 months ago

@mandreux-owkin it's because of hydra, the way it wants you to operate is on a single object on which you repeatdly call fit with different kwargs i.e. we need to be able to pass DP parameters for instance which are not technically data and any parameter we might want to study

jeandut commented 8 months ago

We need to clean this ASAP it's hard to overstate how much time we wasted because of such issues.

ghost commented 8 months ago

@honghaoli42 do you have an opinion on this topic?

honghaoli42 commented 8 months ago

On the object init kwargs and fit kwargs: ideally they are there for different things
- init kwargs are used to initiate different variants of a model -> different models
- fit kwargs are used to specify fitting strategies, hyperparameters, etc.

For simple models this is easily doable, but not for complex models like fedeca, based on what we saw so far I agree with what @mandreux-owkin proposed: initialise all and fit only with data, basically making each model "one-time use only", which is the most straightforward design.

On the hydra constraints:

it's because of hydra, the way it wants you to operate is on a single object on which you repeatdly call fit with different kwargs

Actually it's not hydra but I who chose the implementation, hydra is only a config loader so it does not impose anything on what we could do later. The choice was made based purely on a design perspective with simple models. In our experiment we want to fit the same model with different data, so it's just natural to take the same model and refit, otherwise we could easily reinstantiate but doing so transfers the responsibility of "making sure we use the same model" to the user side. The downside of this choice is that models should be implemented in a way that is ideally stateless or can be reset easily. That's the reason for the reset_experiment attempt. But clearly fedeca is no simple model, it's complex structure make it difficult to fit itself into the pipeline designed for simple models, so let's just change the pipeline instead.

owkin / fedeca

RFC: split of kwargs between init and fit in the FedECA object #27

owkin / fedeca

RFC: split of kwargs between __init__ and fit in the FedECA object #27

RFC: split of kwargs between init and fit in the FedECA object #27