worldbank / ietoolkit

Stata commands designed for Impact Evaluations in particular, but also data work in general
https://worldbank.github.io/ietoolkit/
MIT License
213 stars 76 forks source link

new command - ieddtable - neat tables for diff-in-diff regressions #135

Closed kbjarkefur closed 5 years ago

kbjarkefur commented 6 years ago

This new command was suggested by Esteban J. Quiñones (@estebanjq).

There are many commands (estout, outreg, etc.) that can create neat tables from regressions. This command will target the diff-in-diff (or double difference) regression model only and create tables tailored for exactly this model specification.

We want the table be of the format shown in the mock-up below: image

The command is expected to be specified on the following format:

ieddtable varlist, dummies(D T DT)

Where varlist is a list of outcome variables and D is the treatment dummy, T is the time dummy and DT is the interaction of the two. The command will test that the regression is valid in the sense that there is at least some observations in each group, that D * T actually equals DT and so fourth.

Only the last column showing the double difference mean will be taken from the diff-in-diff regression. The other four means will be taken from regular means calculated separately. The reason for this is that we want to allow the user to include fixed effects, control variables etc., and when that is used the intercept, D, T D*T dummy betas can no longer be used by themselves to calculate means in the four groups. We do not want the means in the four leftmost columns in the mock-up to be impacted by fixed effects and control variables as it can create odd values such as negative harvest etc. A note will be included at the bottom of the table when control variables or fixed effects are used, which will explain why the mean in the fifth column cannot be calculated from the first four when FE and control variables are used.

The command should also be able to display the number of observations and the variation for each group for each outcome variable in addition to only the mean as in the mock-up. We do not know yet what should be the default. The number of observations should also be possible to display at the bottom of the table for each group, and then the command will test that the number is the same for that group in all outcome variables.

This is just a first draft of the specifications for this command. Please comment blow if you have any additional options you want us to include.

bbdaniels commented 6 years ago

For first differences I have previously written a command with a similar reporting layout – you can see it at https://github.com/worldbank/stata/tree/development/dev/Statistics/RandomTrialRegression and the corresponding formatted Table 1 of http://science.sciencemag.org/content/354/6308/aaf7384/tab-figures-data

kbjarkefur commented 6 years ago

That's a really cool command. Can you post picture here in thread of Table 1 in the science link? It requires log in to view (might log in automatically when browsing from WB IP).

It is in many sense similar to what we want to do, but I think we should write our own for the following reasons (this is not a list against your command, it is just my reflections when comparing your implementation to the one I had envisioned for ieddtable that I wanted to documents somewhere):

@luizaandrade , let me know what you think!

We will let you know if we intend to borrow something from your code.

kbjarkefur commented 6 years ago

In commit f26dd5c I have made a quick but documented draft of what I meant with the stats section being agnostic to the output format by creating a matrix of all stats that then can be passed to sub-command that creates the outputs

bbdaniels commented 6 years ago

Totally agree with all of the above! The reason I did this one using putexcel is that I wanted to write confidence intervals and CIs with ( ) so it couldn't go in a matrix. I have since decided that it is a terrible idea especially since putexcel has major backwards compatibility issues even between Stata 13 and 14.

You may also be interested to look at the regression output handling commands I wrote recently for working with CSV tables in TeX if you haven't already (mat2csv and reg2csv here). These leave all the line styling out currently but have the useful convention of building two underlying matrices: results and results_STARS, which can be sensibly looped over to add non-numeric characters to a table like this before exporting to CSV.

screenshot 2018-05-03 11 27 02
estebanjq commented 6 years ago

Wow @bbdaniels , rctreg looks like a great command, thanks for sharing it. Hopefully, ietoolkit can further generalize it across input and output formats, as well as providing additional flexibility.

The option of being able to present SEs or CIs in an appropriate format would certainly be appreciated.

One thing I mentioned in a previous (off thread) conversation with @luizaandrade and @kbjarkefur is that it would be great if a single command could handle and present the relevant information for single differences, single differences controlling for group means at baseline (i.e., ancova), and difference in differences (aka, double difference).

Looking forward to seeing the fruits of this labor!

kbjarkefur commented 6 years ago

Showing the first difference instead of simple means for all group was also the main feedback when we showed this to some of the economists at our unit. So that will definitely be included. Either as default or as an option, we have not decided yet what will be the default.

luizaandrade commented 6 years ago

I've presented the idea for this command in our lightning seminar, here's some of the feedback:

estebanjq commented 6 years ago

Sounds good @luizaandrade . I find the means more informative and intuitive than showing the differences, but I can imagine how others would feel differently. It is fair to that the options to show either (or both, i.e. means followed by the differences) may be quite useful regardless of the default that is chosen.

kbjarkefur commented 5 years ago

This command was merged to the development branch in merge #159. We will finalize this version of ietoolkit and submit to SSC.

kbjarkefur commented 5 years ago

@estebanjq , thanks again for suggesting this command!

Please let us know if you do NOT want to be mentioned in the help file where we currently give you credit for suggesting this command.

We are looking forward to any feedback you might have once this is published, unless you want to sync the files from this repository and try out the command before it is online on SSC. Let us know if you want any advice on how to do that.

We have not implemented any more advanced estimation model yet like ANCOVA as you suggested. We might do that later, but we will first collect feedback of the first version before decided on what to do next with this command.

Thanks again!

estebanjq commented 5 years ago

@kbjarkefur

It is great to hear that this idea has come to fruition.

  1. Please feel free to mention me as you see fit. FYI, my affiliation is the University of Wisconsin-Maryland.

  2. I can wait for it to be available via SSC, unless you think that will take a long time. If so, let me know the best way to sync the files.

  3. It makes sense to start with the most straightforward approach. Additional capabilities can be added later on.

Thanks again for creating this public good!!!

kbjarkefur commented 5 years ago

Great! Thanks!

I am spelling your name Esteban J. Quinnones as the ñ does not display properly in earlier versions of Stata. I hope that is OK. When I went to your GitHub profile page to get the spelling of your name I saw that your affiliation was listed there as University of Wisconsin-Madison, and unless you are doing some cross program with University of Maryland then I think that is what you meant to write.

We intend to submit the new version next week and then it usually take a day or two, or at least not more than a week.

estebanjq commented 5 years ago

Kristoffer,

Yes, I think we can thank autocorrect for Maryland. It is definitely University of Wisconsin - Madison.

If you can’t spell my surname as Quiñones, then just leave it as Quinones (replace the ñ with an n).

I hurry on access to the package.

Thanks again, Esteban

On Oct 20, 2018, at 5:47 PM, Kristoffer Bjärkefur notifications@github.com wrote:

Great! Thanks!

I am spelling your name Esteban J. Quinnones as the ñ does not display properly in earlier versions of Stata. I hope that is OK. When I went to your GitHub profile page to get the spelling of your name I saw that your affiliation was listed there as University of Wisconsin-Madison, and unless you are doing some cross program with University of Maryland then I think that is what you meant to write.

We intend to submit the new version next week and then it usually take a day or two, or at least not more than a week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

luizaandrade commented 5 years ago

@estebanjq, publishing the command on SSC may take a few more days, but you can already use the version in the develop branch. You can find instruction here on how to use it.

estebanjq commented 5 years ago

Thanks Luiza!

On Oct 20, 2018, at 6:11 PM, Luiza notifications@github.com wrote:

@estebanjq, publishing the command on SSC may take a few more days, but you can already use the version in the develop branch. You can find instruction here on how to use it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kbjarkefur commented 5 years ago

ietoolkit is now updated and ieddtab is now released. Type adoupdate, update to install all available updates to all SSC commands you have previously installed,or type ssc install ietoolkit, replace to update only ietoolkit.

I will now close this issue.

bajwaih commented 5 years ago

Thank you all it is great help

kbjarkefur commented 5 years ago

We are happy you found it helpful!