strayMat commented 4 weeks ago

Reviewer 1

[x] Add causal inference and ML references at the beginning and say that we focus on the subfield of model selection for CATE
[x] Add some of the pinpointed refs on g-estimation (Leborgne, Chatton, Ren) at the beginning and say that we are not interested in identifying a family of model as superior but at studying if a model selection procedure is better than another ?
[x] Robinson risk : better explain or remove from main manuscript

Reviewer 2

[ ] Dig the main criticism : does our simulation explore more than overlap differences ? I have the feeling that we explore a large range of baseline to response ratio. I want to quantify this by measuring the ratio (as the empirical distributions of $\mathbb E [ \mu{1}(x) - \mu{0}(x) ]$ and $\mathbb Var [ \mu{1}(x) - \mu{0}(x) ]$ For the difference in distribution, I think that the critic is valid and that we dont explore a wide variety of treatment allocation mechanism (at least in the simulation). But the the semi-simulated datasets do explore this variety (at least ACIC2016). Is it necessary to add an experiment with something more complicated than two gaussians to answer this review?
[x] Reorganize the method (notation and setup) ?
[x] More justification for the experiments : simulation focused on overalp, semi-simulated for more realistic and diverse covariate distributions.
[x] Better explain the DGP for the simulation in A7

strayMat commented 2 weeks ago

Detailed reflexion for main remark

Causal effect ratio

I quantified the range of causal effect sizes that we explored with our simulations. To estimate this concept of effect size, I define the standardized absolute ratio between the two potential outcomes : $\Delta{\mu} = \frac{1}{N} \sum{i=1}^N \big | \frac{\mu_{1}(xi) - \mu{0}(xi)}{\mu{0}(x_i)} \big|$

The distribution of $\Delta_{\mu}$ for the 900 experiences that we are testing is :	count	mean	std	min	1%	10%	25%	50%	75%	90%	99%	max
900	11.7481	205.704	0.145377	0.148319	0.164582	0.176319	0.354042	2.54353	5.36483	15.4412	4367.04

Which is quite reasonable but not ideal since we do not explore three order of magnitude of effect and concentrate a lot of simulations (around 70% below 1. To convince that the R risk dominates (or not) depending on this parameter I should rerun an experiment which register this parameter plot the results along the different quantiles of $\Delta_{\mu}$.

strayMat commented 1 week ago

Gaël pointed me to Alicia's paper (Crabbé et al., 2022) to have some inspiration on how to measure this causal ratio. However, in their paper they don't use a measure of the causal ratio, they use only a parameter of the causal ratio in the simulation. This parameter $\omega{pred}$ is supposed to balance the strength of the prognostic effects (independent of the treatment) and the predictive effects (linked to the treatment). I am not fully convinced by this approach since by rewritting their simulation we can recover a part where $\omega{pred}$ appears but which is unrelated to the treatment effect : $Y = \mu{prog} + \omega{pred} \mu{pred0} + A \omega{pred} [\mu{pred1} - \mu{pred0}]$

I find it more convincing to follow the same idea that we had for overlap : Find an observable measure which correlates well with the parameter of the relative strength of the effect with respect to the baseline $\mu_0(x)$

strayMat commented 1 week ago

First idea, get back the zip with the results. Then for all simulations compute $\Delta{mu}$ then plot relative kendall risk correlation against first and third quantiles of $\Delta{mu}$.

But ..it is not possible to make the link between a simulation and the saved results ....

[ ] Relaunch a new xp (10 sec per dataset on my laptop) that cover a range of different causal effect ratio and measure them.

For example, the following config gives the distribution of causal effect:

dataset_grid = {
        "dataset_name": ["caussim"],
        "overlap": generator.uniform(0, 2.5, size=25),
        "random_state": list(range(1, 4)),
        "treatment_ratio": [0.25, 0.5, 0.75],
        "effect_size": [0.1, 0.5, 0.9],
    }

count	mean	std	min	1%	10%	25%	50%	60%	65%	70%	75%	90%	99%	max
675	26.5555	70.9585	0.0647341	0.067484	0.0809728	0.633559	2.16149	5.94084	6.54794	12.1919	13.0729	76.89	348.838	600.292

strayMat / how_to_select_causal_models

Answer to reviewers, notes #1

Reviewer 1

Reviewer 2

Detailed reflexion for main remark

Causal effect ratio