Open strayMat opened 4 weeks ago
I quantified the range of causal effect sizes that we explored with our simulations. To estimate this concept of effect size, I define the standardized absolute ratio between the two potential outcomes : $\Delta{\mu} = \frac{1}{N} \sum{i=1}^N \big | \frac{\mu_{1}(xi) - \mu{0}(xi)}{\mu{0}(x_i)} \big|$
The distribution of $\Delta_{\mu}$ for the 900 experiences that we are testing is : | count | mean | std | min | 1% | 10% | 25% | 50% | 75% | 90% | 99% | max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
900 | 11.7481 | 205.704 | 0.145377 | 0.148319 | 0.164582 | 0.176319 | 0.354042 | 2.54353 | 5.36483 | 15.4412 | 4367.04 |
Which is quite reasonable but not ideal since we do not explore three order of magnitude of effect and concentrate a lot of simulations (around 70% below 1. To convince that the R risk dominates (or not) depending on this parameter I should rerun an experiment which register this parameter plot the results along the different quantiles of $\Delta_{\mu}$.
Gaël pointed me to Alicia's paper (Crabbé et al., 2022) to have some inspiration on how to measure this causal ratio. However, in their paper they don't use a measure of the causal ratio, they use only a parameter of the causal ratio in the simulation. This parameter $\omega{pred}$ is supposed to balance the strength of the prognostic effects (independent of the treatment) and the predictive effects (linked to the treatment). I am not fully convinced by this approach since by rewritting their simulation we can recover a part where $\omega{pred}$ appears but which is unrelated to the treatment effect : $Y = \mu{prog} + \omega{pred} \mu{pred0} + A \omega{pred} [\mu{pred1} - \mu{pred0}]$
I find it more convincing to follow the same idea that we had for overlap : Find an observable measure which correlates well with the parameter of the relative strength of the effect with respect to the baseline $\mu_0(x)$
But ..it is not possible to make the link between a simulation and the saved results ....
For example, the following config gives the distribution of causal effect:
dataset_grid = {
"dataset_name": ["caussim"],
"overlap": generator.uniform(0, 2.5, size=25),
"random_state": list(range(1, 4)),
"treatment_ratio": [0.25, 0.5, 0.75],
"effect_size": [0.1, 0.5, 0.9],
}
count | mean | std | min | 1% | 10% | 25% | 50% | 60% | 65% | 70% | 75% | 90% | 99% | max |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
675 | 26.5555 | 70.9585 | 0.0647341 | 0.067484 | 0.0809728 | 0.633559 | 2.16149 | 5.94084 | 6.54794 | 12.1919 | 13.0729 | 76.89 | 348.838 | 600.292 |
Reviewer 1
[x] Add causal inference and ML references at the beginning and say that we focus on the subfield of model selection for CATE
[x] Add some of the pinpointed refs on g-estimation (Leborgne, Chatton, Ren) at the beginning and say that we are not interested in identifying a family of model as superior but at studying if a model selection procedure is better than another ?
[x] Robinson risk : better explain or remove from main manuscript
Reviewer 2
[ ] Dig the main criticism : does our simulation explore more than overlap differences ? I have the feeling that we explore a large range of baseline to response ratio. I want to quantify this by measuring the ratio (as the empirical distributions of $\mathbb E [ \mu{1}(x) - \mu{0}(x) ]$ and $\mathbb Var [ \mu{1}(x) - \mu{0}(x) ]$ For the difference in distribution, I think that the critic is valid and that we dont explore a wide variety of treatment allocation mechanism (at least in the simulation). But the the semi-simulated datasets do explore this variety (at least ACIC2016). Is it necessary to add an experiment with something more complicated than two gaussians to answer this review?
[x] Reorganize the method (notation and setup) ?
[x] More justification for the experiments : simulation focused on overalp, semi-simulated for more realistic and diverse covariate distributions.
[x] Better explain the DGP for the simulation in A7