insongkim / wfe

20 stars 4 forks source link

Using weights #5

Open francobeltran opened 5 years ago

francobeltran commented 5 years ago

Hi, I have been trying to using the weights option with no success. I get the following error

Error in wfe(y ~ tr + x1+x 2 + : 'C.it' must be a numeric vector with length equal to number of observations

Basically what I did was to rename my weight variable as C.it in the main dataset. Also I tried defining another dataset named C.it that includes this variable only. My weights are integers: basically the number of observations that correspond to the averages I am using at the unit by time level. I also tried defining these weights as proportions (the ration of this number and the total number of observations). Do you have an example of how to work with weights? Or could you please indicate me how can I incorporate them?

Another thing I realized is that even without weights I need to convert all columns into integers (except for time and year which I set up as factors) for the code to work, else I get the following error:

Error in $<-.data.frame(*tmp*, "W.it", value = numeric(0)) : replacement has 0 rows, data has 17122

Thank you very much,

insongkim commented 5 years ago

Did you set C.it = "varname" where varname is a character string corresponding to the variable name in the data frame? Could you use one of the quantities of interest, e.g., qoi = ate rather than using your own weights? Note that different weights correspond to a different quantity of interest, and so your quantity of interest might not be clear with arbitrary weights. Thank you very much.

francobeltran commented 5 years ago

I understand, I was mistakenly thinking that these weights referred to sampling weights. Apologies for the confusion. Thank you very much for the explanation.

On my other question, is it fine that I need to define “tr”, “y” and all controls as integers and only “unit” and “time” as factors for the code to work? If some of these are numeric the code breaks, for what I interpret is because zeros in my binary outcomes or control variables are not properly captured by the model if they are in numeric form.

Thank you very much again, Francisco

From: insongkim notifications@github.com Sent: Monday, April 22, 2019 3:58 PM To: insongkim/wfe wfe@noreply.github.com Cc: Francisco Beltran Silva fbeltransilva1@student.gsu.edu; Author author@noreply.github.com Subject: Re: [insongkim/wfe] Using weights (#5)

Did you set C.it = "varname" where varname is a character string corresponding to the variable name in the data frame? Could you use one of the quantities of interest, e.g., qoi = ate rather than using your own weights? Note that different weights correspond to a different quantity of interest, and so your quantity of interest might not be clear with arbitrary weights. Thank you very much.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Finsongkim%2Fwfe%2Fissues%2F5%23issuecomment-485531900&data=02%7C01%7Cfbeltransilva1%40student.gsu.edu%7C05b947a7a2d94d0683fb08d6c75cd746%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C636915598898443163&sdata=cExH4x6FSpzZf179T0Jo1mu%2B62iYB9L%2B%2FRx3x8R1TWQ%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAL34Z3I6PZPJWQKZRISVZQTPRYKE7ANCNFSM4HHR6YEA&data=02%7C01%7Cfbeltransilva1%40student.gsu.edu%7C05b947a7a2d94d0683fb08d6c75cd746%7C704d822c358a47849a1649e20b75f941%7C0%7C1%7C636915598898453172&sdata=GwbgdibZqhiVH0sy6DXS9B0s0wn9u%2F9giOhSPh9TeXk%3D&reserved=0.

insongkim commented 5 years ago

treat variable should be binary. The outcome variable y can be a numeric variable. Control variables can be numeric. The unit and time index should be factor as you noted. I recommend that you start with a simple model with treatment and control, and then include control variables one by one gradually to identify any potential reasons why you get an error. We also include a few examples: please try > example(wfe). Thank you very much.

HaoShiming commented 5 years ago

Dear professor Kim, l'd like to know whether the package "wfe" is also suitable for other data types, such as cross-section data or data that doesn't belong to panel, time series or cross-section.

Thanks so much !

insongkim commented 5 years ago

@HaoShiming Thanks for using the package. You may use it on a cross-section data in which you have a distinct group structure (using one-way wfe), although the discussion that we have about dynamics in the following paper may not apply in that case: http://web.mit.edu/insong/www/pdf/FEmatch.pdf

HaoShiming commented 5 years ago

@insongkim Thanks so much for your reply! but, (i) I'm still wondering if wfe is sufficient enough for handling endogenous problems (omitted variables, measurement error, etc.) when the data type is cross-section and no other covariates are included; (ii) the reason for not including covariates is that they make the estimated ate unreasonable and unexplainable; (iii) neither empirical studies nor methodology studies has paid enough attention on the use of covariates in casual analysis. I have seen some papers suggest including covariates is not necessary, such as in Synthetic Control Methods (SCM, Abadie,Dianmond & Hainmueller, 2010; HCW, Hsiao, Ching & Wan, 2012); while others suggest that we must include covariates. In my experience, I find that sometimes not including covariates can get better ate estimates in Monte Carlo simulations.
So I'm wondering when should we include covariates, and what is the criterion of choosing covariates?

Thanks agian and sorry for interrupting.

insongkim commented 5 years ago

An important identification assumption for causal inference is conditional ignorability. You want to adjust for the pre-treatment covariates (confounders) such that the potential outcome is independent of the treatment conditional on pre-treatment covariates. I don't think that including fixed effects is sufficient for solving any endogeneity problem if there are other confounders.