grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
957 stars 248 forks source link

Questions re: causal forest with small sample size #1442

Open micakt opened 1 week ago

micakt commented 1 week ago

Apologies if I am writing this under the wrong heading. This is not a bug report. I have a few questions regarding running a causal forest, particularly with a very small sample size (n less than 300). I am a novice in machine learning and causal forests, and I appreciate any insight you can provide.

  1. I am using code from various sources to run my causal forest and noticed that some sources separate a portion of the data as a test set prior to running the causal forest, and then use only the training sample in the causal forest (e.g., train_fraction <- 0.70) However, other sources do not separate a portion of the data as a test set prior to running the causal forests. Should I be separating a portion of the data as a test set prior to running the causal forests or does grf do this automatically?

  2. When running the test_calibration() command, if differential.forest.prediction is non-significant, does this suggest that the predictions from the forest are inaccurate? Specifically, do I need to further tune my causal forest to ensure it is making accurate predictions?

Thank you!

erikcs commented 3 days ago

Hi @micakt, 1) The answer to that depends on what question you want to answer. If someone wants to estimate, for example, an average treatment effect using forest-based methods to control for confounding, then they typically fit the forest on all the data and get an estimate via the average_treatment_effect function. If, on the other hand, someone wants to fit a causal forest to find treatment heterogeneity, then forming CATE predictions on a held out test set protects you against overfitting and spurious discoveries (the general problem you avoid by doing this is "looking at / or using the data" to form some kind of hypothesis test). 2) It suggests the forest did not detect HTE, here is an alternative and perhaps easier to understand test for the same purpose.

You might find these general resources helpful:

https://www.youtube.com/playlist?list=PLxq_lXOUlvQAoWZEqhRqHNezS30lI49G-

https://www.youtube.com/watch?v=YBbnCDRCcAI

micakt commented 2 days ago

Hi @erikcs, Thank you for your insight and for sharing the additional resources. We are using the causal forest to examine heterogeneous treatment effects, so it sounds like we are correct to have a held out test set. I was a bit concerned about partitioning our data into training and test sets because our sample size is so small (n=263). We are using 70% for training and 30% for testing- given your expertise in this field, does this seem reasonable (particularly given our sample size)?

Regarding the test_calibration() command, I apologize, I meant to ask about the estimate for the mean.forest.prediction (not differential.forest.prediction). My understanding is a coefficient of 1 suggests that the mean forest prediction is correct. If mean.forest.prediction is non-significant, does this suggest that the predictions from the forest are inaccurate? Would I then need to further tune my causal forest to ensure it is making accurate predictions? I will also try RATE to assess heterogeneity as you have suggested.

Thank you again!

erikcs commented 2 days ago

Hi @micakt, n=263 is quite small, this RATE tutorial gives an overview of strategies to increase power to detect HTEs without doing a single train/test split, but still, only n=263 might be asking alot if there is low signal.