judithabk6 / med_bench

BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

Merging DoubleMachineLearning into MultiplyRobust #96

Open brash6 opened 1 day ago

brash6 commented 1 day ago

@houssamzenati Inside the med_bench_prototype code, you implemented MultiplyRobust' fit function like this :

def fit(self, t, m, x, y):
        """Fits nuisance parameters to data
        """
        # bucketize if needed
        t, m, x, y = self.resize(t, m, x, y)

        # fit nuisance functions
        self.fit_score_nuisances(t, m, x, y)

        if self._ratio == 'density':
            self.fit_treatment_propensity_x_nuisance(t, x)
            self.fit_mediator_nuisance(t, m, x)

        elif self._ratio == 'propensities':
            self.fit_treatment_propensity_x_nuisance(t, x)
            self.fit_treatment_propensity_xm_nuisance(t, m, x)

        else:
            raise NotImplementedError

        if self._procedure == 'nesting':
            self.fit_cross_conditional_mean_outcome_nuisance(t, m, x, y)

        elif self._procedure == 'discrete':

            if not is_array_integer(m):
                self._bucketizer = KMeans(n_clusters=10, random_state=self.rng,
                                          n_init="auto").fit(m)
                m = self._bucketizer.predict(m)
            self.fit_mediator_nuisance(t, m, x)
            self.fit_cross_conditional_mean_outcome_nuisance_discrete(t, m, x, y)

        else:
            raise NotImplementedError

        self._fitted = True

        if self.verbose:
            print(f"Nuisance models fitted")

We can see that there's two discrimination variables : ratio and procedure

From my understanding about our discussion today, with some questions in the middle :

fyi @judithabk6

bthirion commented 1 day ago

I may have missed some of the previous steps, but I would add the logic to infer such parameters automatically after the current refactoring. It's an optimization, not really a must-have imho.

judithabk6 commented 1 day ago

@bthirion one of the first step included some legacy code from @houssamzenati, so we are rather reverting back to the current state

not a must-have but not so complicated I think?

houssamzenati commented 1 day ago

@brash6 yes I agree with what you wrote, it is indeed what we said yesterday @bthirion we are indeed trying to implement automatic inference of such variants, the only parameter that will be let is ratio='propensities' or 'density' for the mediator m when it is only discrete only. Would you like to also enforce a single option in that case? we thought with judith that in that case it was beneficial to let the choice.

brash6 commented 1 day ago

@houssamzenati It's clear on my side, so :

judithabk6 commented 1 day ago

I agree with the 3 points

houssamzenati commented 3 hours ago

yes you can remove DML in the PR after the points are treated and also when cross fitting is implemented (is back actually) in the code.