grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
938 stars 250 forks source link

Very low estimated censoring probabilities and extreme values of treatment propensities #1334

Closed BorjaGIH closed 6 months ago

BorjaGIH commented 10 months ago

Hello,

Thanks for this great method and the associated package.

I am trying to use it in my dataset, and I am encountering two warnings that I belive are worrying:

1) "Estimated censoring probabilities go as low as: 1e-04 - forest estimates will likely be very unstable, a larger target horizon is recommended" 2) "Estimated treatment propensities take values very close to 0 or 1. The estimated propensities are between 0.01 and 0.985, meaning some estimates may not be well identified."

For the first warning, I already tried increasing the "horizon" parameter as the message suggests, but I already tried with values that are bigger than the "administrative" censoring time of my dataset (~160 months), so, if I understood correctly the method and code, increasing it further wont change anything.

For the second warning, I tried changing the estimand from survival probability to RMST, but I get an equivalent message.

Next, I attach some summarizing values/plots of my dataset, which I belive may be related to the described issues (or may give information about the reasons of the appearance of the warnings):

imatge imatge

As it can be seen, the dataset is very "inbalanced" in regarding the number of treated/untreated individuals, as well as regarding the number of censored/observed individuals. Do you believe this taxonomy of the dataset disqualifies it from being used with the current method, or is there any part of the process that I could tweak to make it work (I'm thinking the parameters of the forests that model the censoring process and/or the treatment propensity estimation)?

The employed code was:

horizon <- 160
failure.time <- seq(0, horizon, length.out = horizon)
cs.forest <- causal_survival_forest(X=covariates, Y=event_time, W=treatment, 
                                             D=event_type, target="survival.probability", # "RMST"
                                             failure.times=failure.time, horizon=horizon,
                                             alpha=0.01, num.trees=2000)

I have tried with very small values of "horizon" such as 10, and I stop getting the warning, but the results are not useful for me with that value.

Thank you very much in advance.

erikcs commented 10 months ago

Hi @BorjaGIH , 1 is more a heuristic warning, based on the histogram it seems maybe a horizon truncation of around 40 or 50 could be reasonable for targeting the RMST? The first section in this vignette tries to give some intuition behind choosing horizon.

2 is not necessarily an issue for fitting a CSF, rather it tells you that estimating an ATE for the entire sample could be problematic. One way to address the low/high propensities could be to estimate the ATE for the units with estimated propensity score in some range, for example


average_treatment_effect(
  forest,
  subset = forest$W.hat > 0.1 & forest$W.hat < 0.9
)

(more details on what this approach would quantify is here)

BorjaGIH commented 10 months ago

Hi @erikcs,

Thank you very much for the quick response. I will take a look to the suggestions and come back with the conclusions.

Borja

BorjaGIH commented 10 months ago

Hi @erikcs ,

I took a look to the suggested materials, and I do understand the ideas behind them, but I cannot find a solution for my problem. The biggest horizon value for RMST estimator that does not throw the warning (warning 1) is 25 months. I consider this value too low for a study of 10+ years of timespan. Also, the estimand I would prefer is survival.probability, where I can only work with a horizon of 18. Do you have any other suggestion?

Possible strategies I thought about: 1) I am pretty sure that this is not the case, but, if the censoring probabilities act as "weights", and the stability problems of the forests derive from extreme values of those "weights", would it be possible to stabilize them in a similar fashion as with stabilized weights of IPTW/IPCW? 2) Would it be possible/would it make sense to somehow add regularization to the forest employed for modelling the censoring process, such that the probabilites are not that extreme? my first idea would be to try to accomplish that via hyperparameter tuning of such forest, although I am not sure if that is possible.

I think I will try anyway to explore the second option, but nevertheless I would appreciate some reflections from your side about it.

Thank you very much.

Borja

erikcs commented 10 months ago

Hi Borja,

The warnings above are only heuristics we put in place in order to be able to be sure of the statistical reliability of CSF - for data with too much censoring CSF may not be a good fit. You could still proceed with a higher horizon, it is just that we can’t guarantee the reliability of the results. It would be interesting to learn from your experience if you do so, i.e. if your estimates still make sense?

I’m not too familiar with stabilized IPCW, but if it amounts to employing a sample weighted estimator the same way as with IPCW, then this article https://arxiv.org/abs/2207.07758 walks through how you could use GRF to make your own IPCW-weighted causal forest, if that is something you wish to try. (There is a division by the PCs in CSF’s estimating equation, but these are not readily modifiable from the user side)

BorjaGIH commented 10 months ago

Hi @erikcs ,

Thank you for your comments and suggestions. I will try some of the mentioned ideas and come back to inform of the results.

Borja

BorjaGIH commented 6 months ago

Hi @erikcs,

Finally, I found the solution for this. It consisted on setting the mtry parameter to a much smaller value (in particular, to 2). This way, I do not get such extreme values of the censoring probabilities, nor of the treatment probabilities. I still need to understand if this value is too low such that it could somehow invalidate my results, but I don't think so in principle. I thought it was worth mentioning and for you to know for future occasions.

Thanks!

Borja