mwelz / GenericML

R implementation of Generic Machine Learning Inference (Chernozhukov, Demirer, Duflo and Fernández-Val, 2020).
GNU General Public License v3.0
64 stars 14 forks source link

Returning the group membership of the individuals across the splits #11

Open SZegota opened 2 years ago

SZegota commented 2 years ago

First of all, thanks for the package it's great.

While applying this package to a randomized experiment, I came across a possible enchantment:

Using get_clan on an "GenericML" is only possible on variables, that have been specified as covariates (If I am not mistaken). But, extending the quantiles made on an outcome to another is also an interesting application. For example Bryan et al. (2021, p. 21) describe a method called CGATES, "Conditional Sorted Group Average Treatment Effects", in which GATES-quantiles based on a certain outcome ("profit") also show significant difference in other outcome ("revenue","expenses", etc.). Currently, as the quantiles in each split are not accessible, the procedure cannot be replicated with GenericML - at least to my knowledge. I think this would be an interesting extension.

Literature:

Bryan, G. T., Karlan, D., & Osman, A. (2021). Big loans to small businesses: Predicting winners and losers in an entrepreneurial lending experiment (No. w29311). National Bureau of Economic Research.

mwelz commented 2 years ago

Dear @SZegota,

Thank you for your interest in our package. CLAN can be performed on variables that are not used in fitting the proxy estimators. This is the purpose of the argument Z_CLAN of GenericML(): CLAN will be performed on every variable in Z_CLAN and the final estimates can be accessed via get_CLAN(). However, I see that it can be useful in some situations to perform CLAN on variables that have neither been passed with Z nor Z_CLAN to GenericML(). One would need the quantile grouping per split for doing so, which is currently not explicitly returned by GenericML(). There is a way to obtain the quantile grouping per split from a GenericML object, but this is quite cumbersome. We will add the quantile grouping per split to the output of GenericML() in a future release.

Concerning GATES on multiple dependent variables: You would also need the quantile grouping and the proxy learners per split for this purpose. I haven't yet read the paper you have linked, but I will do so to check if we can incorporate this procedure in the package.

Hope this helps!

All the best, Max

SZegota commented 2 years ago

Thank you for the answer, I did not realize the argument Z_CLAN in GenericML() handles this - this works for me quite well! This also solves the problem of multiple outcomes, as you can just pass them into the argument.

mwelz commented 2 years ago

You're welcome! However, I think that we should at least make it optional that GenericML() returns the group membership for each split. We'll implement this in a future release.