Open jeandut opened 8 months ago
Thanks @jeandut for sharing your thoughts on this tricky topic. What you propose makes sense, but it adds boilerplate code (basically getters and setters everywhere). I probably lack context, but why couldn't we go for a simple separation with everything in the init_kwargs
just the data passed to fit
?
@mandreux-owkin it's because of hydra, the way it wants you to operate is on a single object on which you repeatdly call fit with different kwargs i.e. we need to be able to pass DP parameters for instance which are not technically data and any parameter we might want to study
We need to clean this ASAP it's hard to overstate how much time we wasted because of such issues.
@honghaoli42 do you have an opinion on this topic?
For simple models this is easily doable, but not for complex models like fedeca, based on what we saw so far I agree with what @mandreux-owkin proposed: initialise all and fit only with data, basically making each model "one-time use only", which is the most straightforward design.
it's because of hydra, the way it wants you to operate is on a single object on which you repeatdly call fit with different kwargs
Actually it's not hydra but I who chose the implementation, hydra is only a config loader so it does not impose anything on what we could do later.
The choice was made based purely on a design perspective with simple models. In our experiment we want to fit the same model with different data, so it's just natural to take the same model and refit, otherwise we could easily reinstantiate but doing so transfers the responsibility of "making sure we use the same model" to the user side. The downside of this choice is that models should be implemented in a way that is ideally stateless or can be reset easily. That's the reason for the reset_experiment
attempt.
But clearly fedeca is no simple model, it's complex structure make it difficult to fit itself into the pipeline designed for simple models, so let's just change the pipeline instead.
The way the FedECA API was created was to follow the fammous sklearn
fit
API with the constraints given by Hydra ,the xp manager, namely that the same fedeca object needs to be sequentially fitted with different fit/config args which should lead to different behaviors according to the experimental plan described in the paper, which for now vary:n_samples
and other data-related hyper-parameters such as the number of ties or covariate shifts or number of federated clients or repartition of data between them (and substra handles if the dataset exists)fit_kwargs
has to contain at least roughly{"data_kwargs": {"n_samples": 1000 , "shifts": 0.4, "cate": 0.2, "propensity": etc. }, "n_clients": 10, "substra_kwargs": {"dataset_hash", }}
.The problem is about the init kwargs (
init_kwargs
) and its relation with the fit kwargs (fit_kwargs
):The init can have either its own kwargs (therefore there is a complete separation of
fit_kwargs
andinit_kwargs
) OR its ownkwargs
and some of the fit kwargs (all or a subset) OR no kwargs at all OR all the same kwargs. The complete separation is cleaner however it means nothing much happens in the init and indeed, what should be happening in the init ? Maybe the init should have no kwargs and do nothing or almost nothing => careful there is inheritance of BaseEstimator and BootstrapMixin TODO: add this constraint.Having overlap means handling the case where you have an init with certain kwargs and then fit with other kwargs which might change or not the value of the previous kwargs. This is what's happening right now in the code. The current logic is awful because it tries to detect which params have changed, in a very brittle way if for instance the user instantiated FedECA without DP and now wants to fit it with DP it has to detect it and reinstantiate the strategies attribute (the reverse case could also happen). Bugs have previously been found by @honghaoli42 and by me in this logic and I would not be surprised if there were some more.
Handling more graciously this type of logic would necessitate the use of dedicated
set_attributes_with_kwargs
methods with default kwargs at None which would detect non-None hyperparameters and reinitialize only such kwargs OR with exactly the same default as the init and setting all kwargs no matter their values. This method could or not use dedicated object such as Enums. This is not done today.Considering sklearn, surprisingly it seems there is no clear separation and some kind of the same overlapping logic is implemented see i.e. the
fit
method of the BaseSGD class: https://github.com/scikit-learn/scikit-learn/blob/f07e0138b/sklearn/linear_model/_stochastic_gradient.py#L930See i.e. alpha, loss, learning_rate, ... Delving into the fit logic you can see the same, kind of non-intuitive (very subjectively) patterns: if you didn't pass a
coef_init
it is thusNone
and so the object should use its attribute:self.coef_
What should we do ? First for kwargs separation (or not) and then if there is overlap what would be a pseudocode for a cool logic ?
In my opinion, seeing that sklearn did go with some overlap I guess having all kwargs overlap for now seems like it could be a good first step. Regarding the associated logic I would go for a
set_attributes_with_kwargs
which would be called by both init AND fit methods with all its kwargs at None and would iterate on its kwargs and detect non-None kwargs and set them immediately:Once we reach a decision whatever it is it should be relatively quick to implement and would prevent much bugs in the future.