new command - ieddtable - neat tables for diff-in-diff regressions

kbjarkefur commented 6 years ago

This new command was suggested by Esteban J. Quiñones (@estebanjq).

There are many commands (estout, outreg, etc.) that can create neat tables from regressions. This command will target the diff-in-diff (or double difference) regression model only and create tables tailored for exactly this model specification.

We want the table be of the format shown in the mock-up below:

The command is expected to be specified on the following format:

ieddtable varlist, dummies(D T DT)

Where varlist is a list of outcome variables and D is the treatment dummy, T is the time dummy and DT is the interaction of the two. The command will test that the regression is valid in the sense that there is at least some observations in each group, that D * T actually equals DT and so fourth.

Only the last column showing the double difference mean will be taken from the diff-in-diff regression. The other four means will be taken from regular means calculated separately. The reason for this is that we want to allow the user to include fixed effects, control variables etc., and when that is used the intercept, D, T D*T dummy betas can no longer be used by themselves to calculate means in the four groups. We do not want the means in the four leftmost columns in the mock-up to be impacted by fixed effects and control variables as it can create odd values such as negative harvest etc. A note will be included at the bottom of the table when control variables or fixed effects are used, which will explain why the mean in the fifth column cannot be calculated from the first four when FE and control variables are used.

The command should also be able to display the number of observations and the variation for each group for each outcome variable in addition to only the mean as in the mock-up. We do not know yet what should be the default. The number of observations should also be possible to display at the bottom of the table for each group, and then the command will test that the number is the same for that group in all outcome variables.

The table will be possible to export in CSV or in TeX.
The default labels in the table will be those in the mock-up, but all of them should be possible to specify manually.
The variable labels for the outcome variables should be possible to set to varname, var label or to be specified manually.
We have not decided yet if we want stars in a separate column.
Star intervals should be possible to set manually.

This is just a first draft of the specifications for this command. Please comment blow if you have any additional options you want us to include.

bbdaniels commented 6 years ago

For first differences I have previously written a command with a similar reporting layout – you can see it at https://github.com/worldbank/stata/tree/development/dev/Statistics/RandomTrialRegression and the corresponding formatted Table 1 of http://science.sciencemag.org/content/354/6308/aaf7384/tab-figures-data

kbjarkefur commented 6 years ago

That's a really cool command. Can you post picture here in thread of Table 1 in the science link? It requires log in to view (might log in automatically when browsing from WB IP).

It is in many sense similar to what we want to do, but I think we should write our own for the following reasons (this is not a list against your command, it is just my reflections when comparing your implementation to the one I had envisioned for ieddtable that I wanted to documents somewhere):

We want something that output in both LaTeX, and in Excel as well as output in Stata's result window. Your command needs some work to not only write to Excel.
The way you write to Excel requires putexcel. That would require us to change the lowest level of Stata needed for ietoolkit which we do not yet have an intention to do. (Everyone in well funded institutions have newer versions of Stata, but that's not the only audience we are targeting)
We want to test something on this command that we intend to use for a re-write of iebaltab. That re-write would make the section where stats are generated output type agnostic. As in, that section only creates a matrix with all output values, and then different sections for different outputs types (Excel, LaTeX etc.) reads that matrix. The code for iebaltab is starting to get very difficult to follow as we are writing the output in between the code that generates the stats.

@luizaandrade , let me know what you think!

We will let you know if we intend to borrow something from your code.

kbjarkefur commented 6 years ago

In commit f26dd5c I have made a quick but documented draft of what I meant with the stats section being agnostic to the output format by creating a matrix of all stats that then can be passed to sub-command that creates the outputs

bbdaniels commented 6 years ago

Totally agree with all of the above! The reason I did this one using putexcel is that I wanted to write confidence intervals and CIs with ( ) so it couldn't go in a matrix. I have since decided that it is a terrible idea especially since putexcel has major backwards compatibility issues even between Stata 13 and 14.

You may also be interested to look at the regression output handling commands I wrote recently for working with CSV tables in TeX if you haven't already (mat2csv and reg2csv here). These leave all the line styling out currently but have the useful convention of building two underlying matrices: results and results_STARS, which can be sensibly looped over to add non-numeric characters to a table like this before exporting to CSV.

estebanjq commented 6 years ago

Wow @bbdaniels , rctreg looks like a great command, thanks for sharing it. Hopefully, ietoolkit can further generalize it across input and output formats, as well as providing additional flexibility.

The option of being able to present SEs or CIs in an appropriate format would certainly be appreciated.

One thing I mentioned in a previous (off thread) conversation with @luizaandrade and @kbjarkefur is that it would be great if a single command could handle and present the relevant information for single differences, single differences controlling for group means at baseline (i.e., ancova), and difference in differences (aka, double difference).

Looking forward to seeing the fruits of this labor!

kbjarkefur commented 6 years ago

Showing the first difference instead of simple means for all group was also the main feedback when we showed this to some of the economists at our unit. So that will definitely be included. Either as default or as an option, we have not decided yet what will be the default.

luizaandrade commented 6 years ago

I've presented the idea for this command in our lightning seminar, here's some of the feedback:

When there's attrition, we should only include complete observations, i.e., those in the double difference regression, in the table. I think we can also add an option to include all observations, as long as the complete observations are the default.
It was suggested that it would be more intuitive to display the baseline levels, the baseline to endline change and then the double difference. That would look something like the figure below. The argument for this is that it may be confusing for a less technical audience if the dd coefficient is not the difference of the means displayed. This could either be an option or the default, and we would probably need to give some thought as to whether we want the two main columns to be the rounds or the treatment arms (i.e. Control and Treatment with subcolumns Baseline and Endline, or Baseline and Endline with subcolumns for Control and Treatment).
People like both the single difference and the ANCOVA options. Single difference would be something like the image below, and ANCOVA would be similar to diff-in-diff, but with a different title for the regression coefficient in the last column.

estebanjq commented 6 years ago

Sounds good @luizaandrade . I find the means more informative and intuitive than showing the differences, but I can imagine how others would feel differently. It is fair to that the options to show either (or both, i.e. means followed by the differences) may be quite useful regardless of the default that is chosen.

kbjarkefur commented 5 years ago

This command was merged to the development branch in merge #159. We will finalize this version of ietoolkit and submit to SSC.

kbjarkefur commented 5 years ago

@estebanjq , thanks again for suggesting this command!

Please let us know if you do NOT want to be mentioned in the help file where we currently give you credit for suggesting this command.

We are looking forward to any feedback you might have once this is published, unless you want to sync the files from this repository and try out the command before it is online on SSC. Let us know if you want any advice on how to do that.

We have not implemented any more advanced estimation model yet like ANCOVA as you suggested. We might do that later, but we will first collect feedback of the first version before decided on what to do next with this command.

Thanks again!

estebanjq commented 5 years ago

@kbjarkefur

It is great to hear that this idea has come to fruition.

Please feel free to mention me as you see fit. FYI, my affiliation is the University of Wisconsin-Maryland.
I can wait for it to be available via SSC, unless you think that will take a long time. If so, let me know the best way to sync the files.
It makes sense to start with the most straightforward approach. Additional capabilities can be added later on.

Thanks again for creating this public good!!!

kbjarkefur commented 5 years ago

Great! Thanks!

I am spelling your name Esteban J. Quinnones as the ñ does not display properly in earlier versions of Stata. I hope that is OK. When I went to your GitHub profile page to get the spelling of your name I saw that your affiliation was listed there as University of Wisconsin-Madison, and unless you are doing some cross program with University of Maryland then I think that is what you meant to write.

We intend to submit the new version next week and then it usually take a day or two, or at least not more than a week.

estebanjq commented 5 years ago

Kristoffer,

Yes, I think we can thank autocorrect for Maryland. It is definitely University of Wisconsin - Madison.

If you can’t spell my surname as Quiñones, then just leave it as Quinones (replace the ñ with an n).

I hurry on access to the package.

Thanks again, Esteban

On Oct 20, 2018, at 5:47 PM, Kristoffer Bjärkefur notifications@github.com wrote:

Great! Thanks!

I am spelling your name Esteban J. Quinnones as the ñ does not display properly in earlier versions of Stata. I hope that is OK. When I went to your GitHub profile page to get the spelling of your name I saw that your affiliation was listed there as University of Wisconsin-Madison, and unless you are doing some cross program with University of Maryland then I think that is what you meant to write.

We intend to submit the new version next week and then it usually take a day or two, or at least not more than a week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

luizaandrade commented 5 years ago

@estebanjq, publishing the command on SSC may take a few more days, but you can already use the version in the develop branch. You can find instruction here on how to use it.

estebanjq commented 5 years ago

Thanks Luiza!

On Oct 20, 2018, at 6:11 PM, Luiza notifications@github.com wrote:

@estebanjq, publishing the command on SSC may take a few more days, but you can already use the version in the develop branch. You can find instruction here on how to use it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kbjarkefur commented 5 years ago

ietoolkit is now updated and ieddtab is now released. Type adoupdate, update to install all available updates to all SSC commands you have previously installed,or type ssc install ietoolkit, replace to update only ietoolkit.

I will now close this issue.

bajwaih commented 5 years ago

Thank you all it is great help

kbjarkefur commented 5 years ago

We are happy you found it helpful!

worldbank / ietoolkit

new command - ieddtable - neat tables for diff-in-diff regressions #135