judithabk6 / med_bench

BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

Python implementation of med_dml #32

Closed sami6mz closed 11 months ago

sami6mz commented 1 year ago

A Python implementation of the already existing R function medDML, which provides similar results. If you wish to compare with R, here the code :

import numpy as np
from numpy.random import default_rng
from med_bench.src.get_simulated_data import simulate_data
from med_bench.src.benchmark_mediation import medDML, med_dml

data = simulate_data(
    1000, default_rng(3), False, False, 2, 1, 7, "binary", 0.5, 0.5, 0.5, 0.5
)
x = data[0]
t = data[1]
m = data[2]
y = data[3]
effects = np.array(data[4:9])
proba = np.array(data[9:11])

res_r = medDML(y.ravel(), t.ravel(), m, x)
res_python = med_dml(x, t, m, y)
print(res_r)
print(res_python)

error_r = abs((res_r[0:5] - effects) / effects)
error_python = abs((res_python[0:5] - effects) / effects)
print(error_r)
print(error_python)
print(res_r)
[1.3048670521145547, 1.226514998320843, 1.1352313574251864, 0.16963569468936823, 0.07835205379371174, 71.0]
print(res_python)
[1.283569626175278, 1.1560215941529228, 1.220204946283059, 0.06336467989221894, 0.1275480320223553, 72]
print(true_effects)
[1.325, 1.2  , 1.2  , 0.125, 0.125]

Note : Often one of the two pairs (direct0,indirect1) and (direct1,indirect0) is biased, either in R or Python Note : I'm not sure which alpha should I take for Lasso estimations, I've chosen them arbitrarily

sami6mz commented 1 year ago

Forest estimation and calibration have been added Forest seems to work well without calibration and without cross-fitting crossfit=0, calibration=True and forest=False by default