Efficient g-estimation - Githubissues

Is your feature request related to a problem? Please describe.

Currently, the g-estimation approach implemented follows the equations in Robins et al. 1992. The alternative estimating equations in Robins et al. 1994 are doubly-robust. The '94 paper provides the efficient g-estimator. In certain cases, these two estimators are asymptotically equivalent. To quote from Vansteelandt & Sjolander Epidemiol. Methods 2016; 5(1): 37–56

It follows in that case that the efficient g-estimator of ψ is asymptotically equivalent (and under some conditions also mathematically identical – not shown) to the proposed estimator, i. e. the solution to eq. [18].

[18] of above is the currently implementation in ee_gestimation_snmm. So, these solutions won't be necessarily identical in finite samples (if my reading is correct, which also agrees with the examples I've built). What the mathematical conditions are for them to be equal is unclear to me... but doesn't really matter here.

Describe the solution you'd like

Request is to add the efficient g-estimator in ee_gestimation_snmm. To support both options (since they are not identical), there should be an optional argument, like approach='efficient'. While it will change future behavior, the default behavior should be the efficient estimator. This default will also make the log-linear SNMM easier (see below).

I will also need to add an argument for the outcome process model. This will only be used by the efficient estimator and what provides the double robustness and efficiency. Due to the double robustness, this model can also simply not be specified (the predicted values from this 'model' are just set to be all zeroes). So, default behavior will be no outcome process model. This will make it easier to support both g-estimators.

The following code provides a simple implementation of the efficient g-estimator for a linear SNMM.

def psi(theta):
    # Breaking out parameters
    phi = theta[0:2]
    alpha = theta[2:6]
    beta = theta[6:]

    h_phi = y - np.dot(V*A[:, None], phi)

    # Propensity score model
    ee_pra = ee_regression(theta=alpha, X=Wa, y=d['A'],
                           model='logistic')
    pi_a = inverse_logit(np.dot(Wa, alpha))
    # Outcome process model
    ee_out = ee_regression(theta=beta, X=Wy, y=h_phi,
                           model='linear')
    yhat = np.dot(Wy, beta)

    # Estimating equations for SNMM
    a_resid = (A - pi_a)[:, None]
    y_resid = (y - yhat)[:, None]
    snm = np.dot(V, phi)[:, None]
    ee_snm = (a_resid * (y_resid - snm*a_resid) * V).T

    return np.vstack([ee_snm, ee_pra, ee_out])

Note that this issue will also help to solve #30 as the estimating equations for the efficient log-linear SNMM are provided in Section 3 of Vansteelandt & Joffe Statistical Science 2014; 29(4): 707–731. I have not been as successful in finding the inefficient log-linear SNMM.

Describe alternatives you've considered

None.

Additional context

None.

pzivich / Delicatessen

Efficient g-estimation #34