kylebutts / did2s_stata

Two-Stage Difference-in-Differences following Gardner (2021)
30 stars 14 forks source link

Dichotomous/polychotomous dependent variable #14

Closed saharnazb closed 1 year ago

saharnazb commented 1 year ago

Can this method (either in stata or R) be applied when the dependent variable is a factor variable (dichotomous/polychotomous)?

kylebutts commented 1 year ago

The method "works" whenever the first-stage model is specified correctly for the outcome variable. For example, if you have an indicator variable as an outcome and you think you've correctly specified the (linear) propensity score model. In general, though, the first-stage model is unlikely to hold

saharnazb commented 1 year ago

RIght. So the first stage is estimated usig OLS and not GMM, correct? I assumed the first stage is also estimated using GMM. Then, no distributional assumption is imposed on errors. But it's not the case. Thanks for your response

kylebutts commented 1 year ago

I'm not sure I understand the question. OLS is a form of a GMM estimator with moments given by: CleanShot 2023-04-13 at 10 37 51@2x

Additionally, OLS never requires a distributional assumption on the errors. It only needs the conditional mean of the error is zero. The distributional assumption is an additional assumption used to prove efficiency of the OLS estimator.

saharnazb commented 1 year ago

Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2scommands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome.

friosavila commented 1 year ago

hi Saharnaz I think you have concepts here confused.

  1. OLS consistency does not depend on normality of the errors. It does depend on the zero conditional mean.
  2. OLS does not assume variables are continuous. that is why we can use it for almost all kind of models. Standard errors can always be corrected if one believes they are not homoskedastic.(robust standard erros for once)
  3. MLE Does impose distributional assumptions. Otherwise you cannot use it. GMM , as OLS, does not impose distributional assumptions. Just conditions on the Moments.
  4. All the commands you mention actually can handle binary variables as dependent variables, but you need to acknowledge that they use LPM (or something similar to it).
  5. If you want to use something that handles binary variables explicitly you can use jwdid jwdid y x1 x2 x3, .... method(logit) Here, however, the parallel test assumption is not on the observed probability, but on the latent variable.

F

On Thu, Apr 13, 2023 at 2:43 PM Saharnaz Babaei-Balderlou < @.***> wrote:

Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But MLE and GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome.

— Reply to this email directly, view it on GitHub https://github.com/kylebutts/did2s_stata/issues/14#issuecomment-1507452595, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFW3UEF2Y44WX37NGC3XBBCMFANCNFSM6AAAAAAW5H3ROE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

saharnazb commented 1 year ago

Thank you F for your time. Maybe I am making a mistake. I will refer to my textbooks regarding how violation of normality assumption could relate to the inference, hypothesis testing, consistency and efficiency of the estimators. Probably, I am confused. Thank you for your explanations. I will check on details. But binary variable does not have a continuous distribution. We can only assume it as being continuous for LPM.
jwdid for some reason is not converging and I could not find the reason for the error yet (possibly something in the way I set it up). In the meantime working on the error, I tried to check out if there are other options available.

friosavila commented 1 year ago

if jwdid is not converging, is probably because a) not enough data. You have very few observations per cohort per year b) too many controls. This will indirectly affect a) I would definitely need more information to say something about why jwdid is not working for you.

On Thu, Apr 13, 2023 at 3:38 PM Saharnaz Babaei-Balderlou < @.***> wrote:

Thank you F for your time. Maybe I am making a mistake. I will refer to my textbooks regarding how violation of normality assumption could relate to the inference, hypothesis testing, consistency and efficiency of the estimators. Probably, I am confused. Thank you for your explanations. I will check on details. But binary variable does not have a continuous distribution. We can only assume it as being continuous for LPM. jwdid for some reason is not converging and I could not find the reason for the error yet (possibly something in the way I set it up). In the meantime working on the error, I tried to check out if there are other options available.

— Reply to this email directly, view it on GitHub https://github.com/kylebutts/did2s_stata/issues/14#issuecomment-1507514555, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFQ3ZUP7ZLQ3ARLVNS3XBBI2NANCNFSM6AAAAAAW5H3ROE . You are receiving this because you commented.Message ID: @.***>

saharnazb commented 1 year ago

Thank you. I am not sure if this (issue of another code) is the right place to send details. Can I email you? I believe the problem is the number of controls. I am controling for 35-40 variables with 180,000 observations. However, even without controls, I get a lot of dots in the results table instead of t-stats, variances and CIs.

kylebutts commented 1 year ago

hi Saharnaz I think you have concepts here confused. 1. OLS consistency does not depend on normality of the errors. It does depend on the zero conditional mean. 2. OLS does not assume variables are continuous. that is why we can use it for almost all kind of models. Standard errors can always be corrected if one believes they are not homoskedastic.(robust standard erros for once) 3. MLE Does impose distributional assumptions. Otherwise you cannot use it. GMM , as OLS, does not impose distributional assumptions. Just conditions on the Moments. 4. All the commands you mention actually can handle binary variables as dependent variables, but you need to acknowledge that they use LPM (or something similar to it). 5. If you want to use something that handles binary variables explicitly you can use jwdid jwdid y x1 x2 x3, .... method(logit) Here, however, the parallel test assumption is not on the observed probability, but on the latent variable. F On Thu, Apr 13, 2023 at 2:43 PM Saharnaz Babaei-Balderlou < @.> wrote: Well OLS imposes the assumption of normal dist. on error terms. It helps us to make statistical inference. In OLS, if the errors are normally distributed with mean zero and constant variance, then the OLS estimator is consistent (and efficient). Estimating a specification with binary dep. var. leads to predicted values less than 0 and more than 1. OLS assumes that the outcome var is continuous and normally distributed. Binary variables are inherently dichotomous and take only two values. Also, with binary outcome, the variance of errors will depend on the value of the independent variables, resulting in a violation of the constant variance assumption. So, the logtistic regression is suggested. But MLE and GMM do not impose normal dist. assumption. I am sorry if my question was confusing. I am searching a way to test pretrends and do an event study for my case where outcome is polychotomous, data is repeated cross-sections, and treatment is staggered. That is why, I am searching the literature of DID to be consistent with my scenario. I am more of an applied economist and have not been successful to master DID literature yet. After looking at csdid, jwdid, and did2s commands in stata, I am trying to find out which could be the best for me. csdid is not suitable for binary outcome. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFW3UEF2Y44WX37NGC3XBBCMFANCNFSM6AAAAAAW5H3ROE . You are receiving this because you are subscribed to this thread.Message ID: @.>

Agreed on all fronts with @friosavila!