Closed colinvance closed 1 year ago
@colinvance Have you tried the default mvec
? (i.e. without passing the option mvec()
at all?
Thanks for the tip. I just gave it a try, but continue to get missing values: . honestdid, pre(1/5) post(7/8) (note: running execution using -parallel-; see help parallel for details)
M | lb | ub | |
---|---|---|---|
. | -0.322 | -0.320 | (Original) |
0.2222 | . | . | |
0.4444 | . | . | |
0.6667 | . | . | |
0.8889 | . | . | |
1.1111 | . | . | |
1.3333 | . | . | |
1.5556 | . | . | |
1.7778 | . | . | |
2.0000 | . | . |
(method = C-LF, Delta = DeltaRM, alpha = 0.050)
@colinvance Can you post the output of
matrix list e(b)
matrix list e(V)
after you run redhdfe
?
Gladly, please see output below
. reghdfe e10 b22796.Dyear, absorb(intid date1) cluster(intid) noconstant (MWFE estimator converged in 2 iterations)
HDFE Linear regression Number of obs = 111,112 Absorbing 2 HDFE groups F( 7, 13888) = 57597.40 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.9673 Adj R-squared = 0.9626 Within R-sq. = 0.7847 Number of clusters (intid) = 13,889 Root MSE = 0.0217
(Std. Err. adjusted for 13,889 clusters in intid)
| Robust
e10 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| intid | 13889 13889 0 *| date1 | 8 0 8 | -----------------------------------------------------+
. matrix list e(b)
e(b)[1,8]
. matrix list e(V)
symmetric e(V)[8,8]
@colinvance You should expand the grid search in this case; the default values are +/- 20 the SE of the first post-period coefficient. Here you have 20 0.0006, which gives +/- 0.012. However, the CI is around -0.3. So this works, for example:
honestdid, pre(1/5) post(7/8) grid_lb(-0.5) grid_ub(0.5)
@jonathandroth Btw I know for DeltaSD the default grid is more sophisticated than the +/- 20 SE; is there a way to do a possibly more sensible default for DeltaRM as well?
Worked like a charm - thank you! . honestdid, pre(1/5) post(7/8) grid_lb(-0.5) grid_ub(0.5) (note: running execution using -parallel-; see help parallel for details)
M | lb | ub | |
---|---|---|---|
. | -0.322 | -0.320 | (Original) |
0.2222 | -0.325 | -0.317 | |
0.4444 | -0.328 | -0.314 | |
0.6667 | -0.331 | -0.311 | |
0.8889 | -0.334 | -0.307 | |
1.1111 | -0.338 | -0.304 | |
1.3333 | -0.341 | -0.301 | |
1.5556 | -0.344 | -0.297 | |
1.7778 | -0.348 | -0.294 | |
2.0000 | -0.351 | -0.290 |
(method = C-LF, Delta = DeltaRM, alpha = 0.050)
.
Great, glad the issue was resolved in this use-case! Thanks for reaching out, Colin!
Two thoughts, Mauricio:
1) In the short run, it seems like it would be useful to add a warning along the lines of "Could not find upper and lower bounds using the default grid. Try expanding the grid using the grid_lb() and grid_ub() options".
2) In terms of longer run solutions: the C-LF confidence set is always contained within the LF confidence interval using size kappa (= 0.05/10 in our implementation). The bounds of the LF confidence interval can actually be computed via linear programming -- see Sec 5.4.4 here https://www.jonathandroth.com/assets/files/arp-draft.pdf, which is implemented in Matlab here https://github.com/jonathandroth/LinearMomentInequalities/blob/main/lf_ci_for_linear_params.m. So we could (i) first compute the LF bounds using linear programming, and then (ii) only do the C-LF test inside those end points. Let me know if it'd be helpful to discuss or you think that this would be difficult to implement. Given that a few people have asked about the grid, it seems worth implementing if the cost is not too high.
Thanks!
Best, Jon
On Thu, Feb 16, 2023 at 10:52 AM colinvance @.***> wrote:
Worked like a charm - thank you! . honestdid, pre(1/5) post(7/8) grid_lb(-0.5) grid_ub(0.5) (note: running execution using -parallel-; see help parallel for details) M lb ub . -0.322 -0.320 0.2222 -0.325 -0.317 0.4444 -0.328 -0.314 0.6667 -0.331 -0.311 0.8889 -0.334 -0.307 1.1111 -0.338 -0.304 1.3333 -0.341 -0.301 1.5556 -0.344 -0.297 1.7778 -0.348 -0.294 2.0000 -0.351 -0.290 (method = C-LF, Delta = DeltaRM, alpha = 0.050)
.
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1433307317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFEHPPDCCYJXSBYDSALWXZEMBANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
PS: Did I read correctly that the default is just +/- 20 SE? Seems like a more sensible "heuristic" default would be [point estimate] +/- ( max pre-period violation + 20 SE) or something like that? If option 2) above is costly to implement, a simple change in the default like this might work well.
On Thu, Feb 16, 2023 at 12:24 PM Jonathan Roth @.***> wrote:
Great, glad the issue was resolved in this use-case! Thanks for reaching out, Colin!
Two thoughts, Mauricio:
1) In the short run, it seems like it would be useful to add a warning along the lines of "Could not find upper and lower bounds using the default grid. Try expanding the grid using the grid_lb() and grid_ub() options".
2) In terms of longer run solutions: the C-LF confidence set is always contained within the LF confidence interval using size kappa (= 0.05/10 in our implementation). The bounds of the LF confidence interval can actually be computed via linear programming -- see Sec 5.4.4 here https://www.jonathandroth.com/assets/files/arp-draft.pdf, which is implemented in Matlab here https://github.com/jonathandroth/LinearMomentInequalities/blob/main/lf_ci_for_linear_params.m. So we could (i) first compute the LF bounds using linear programming, and then (ii) only do the C-LF test inside those end points. Let me know if it'd be helpful to discuss or you think that this would be difficult to implement. Given that a few people have asked about the grid, it seems worth implementing if the cost is not too high.
Thanks!
Best, Jon
On Thu, Feb 16, 2023 at 10:52 AM colinvance @.***> wrote:
Worked like a charm - thank you! . honestdid, pre(1/5) post(7/8) grid_lb(-0.5) grid_ub(0.5) (note: running execution using -parallel-; see help parallel for details) M lb ub . -0.322 -0.320 0.2222 -0.325 -0.317 0.4444 -0.328 -0.314 0.6667 -0.331 -0.311 0.8889 -0.334 -0.307 1.1111 -0.338 -0.304 1.3333 -0.341 -0.301 1.5556 -0.344 -0.297 1.7778 -0.348 -0.294 2.0000 -0.351 -0.290 (method = C-LF, Delta = DeltaRM, alpha = 0.050)
.
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1433307317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFEHPPDCCYJXSBYDSALWXZEMBANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
I'm delighted the case was resolved - it was not my expectation that the matter would be wrapped up in an hour! Colin
Mauricio is a machine ;-)
On Thu, Feb 16, 2023 at 12:29 PM colinvance @.***> wrote:
I'm delighted the case was resolved - it was not my expectation that the matter would be wrapped up in an hour! Colin
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1433454549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFEPUF4MTBLRWVU77FLWXZPYVANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
PS: Did I read correctly that the default is just +/- 20 SE? Seems like a more sensible "heuristic" default would be [point estimate] +/- ( max pre-period violation + 20 SE) or something like that? If option 2) above is costly to implement, a simple change in the default like this might work well.
@jonathandroth Yes; see for instance here. It's trivial to offset the grid by the point estimate, but less sure where to get the max pre-period violation.
I had in mind, in R notation
max( abs( diff( c(betapre,0) ) ) )
I.e, maximum absolute-value of the first-differenced pre-treatment coefficients (including the one implicitly normalized to zero).
On Thu, Feb 16, 2023 at 2:21 PM Mauricio Caceres Bravo < @.***> wrote:
PS: Did I read correctly that the default is just +/- 20 SE? Seems like a more sensible "heuristic" default would be [point estimate] +/- ( max pre-period violation + 20 SE) or something like that? If option 2) above is costly to implement, a simple change in the default like this might work well.
@jonathandroth https://github.com/jonathandroth Yes; see for instance here https://github.com/asheshrambachan/HonestDiD/blob/master/R/deltarm.R#L285-L286. It's trivial to offset the grid by the point estimate, but less sure where to get the max pre-period violation.
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1433596397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFBFG5NN5QHKZYDIGT3WXZ44RANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
@jonathandroth That wouldn't offset the range by much if the pretrend is somewhat flat, right? In this case, for instance, I get an offset of around 0.015, so the result is the same. (We also need to sign the magnitude since the CI is signed, but I can do something like this sign(betapost %*% l_vec) * max(abs(diff(c(betapre, 0))))
.)
Yes, if pre-trends are flat then it wouldn't matter much.
I didn't quite understand the point about the signs. With two coefficients (one before treatment and one after), the RM restriction is |delta1| <= |delta{-1}|, so the bounds are somewhat agnostic about the sign (i.e. the bias could be positive or negative, with magnitude no larger than |delta_{-1}|.
If l_vec is a basis vector for the k'th post-treatment period, the identified set for RM looks like:
betapost_k +/- [ k Mbar max(abs(diff(c(betapre, 0)))) ]
So I was thinking that you would use the sample analog to that (replacing beta's with betahat's) plus some multiple of the standard error to account for sampling error.
Does that make sense? Happy to talk in person if easier.
On Thu, Feb 23, 2023 at 9:53 AM Mauricio Caceres Bravo < @.***> wrote:
@jonathandroth https://github.com/jonathandroth That wouldn't offset the range by much if the pretrend is somewhat flat, right? In this case, for instance, I get an offset of around 0.015, so the result is the same. (We also need to sign the magnitude since the CI is signed, but I can do something like this sign(betapost %% l_vec) max(abs(diff(c(betapre, 0)))).)
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1441926419, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFFDPFKVE7FN4TFTICDWY52XJANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
@jonathandroth
Happy to talk in person if easier.
For sure; I didn't quite follow the above. LMK when might be a good time.
@jonathandroth I was thinking about this some more and I think my only remaining confusion really is that l_vec
can give you any linear combination of betapost, so what to do in that case?
If l_vec >0, then the worst-case upper bound for l_vec' taupost = l_1 tau_1 + ... + l_K tau_K is bounded above by l_1 ub_1 + ... + l_k ub_K, where ub_k is the upper bound for tau_k (i.e. when l_vec is the kth basis vector).
Similarly, if l_vec < 0, then you get an upper bound of l_1 lb_1 + ... + l_k lbK, where the lb_k are the lower bounds.
More generally, we can set upperbound_k = l_k ub_k if l_k > 0 and upperbound_k = l_k lb_k if l_k < 0, then set our upper bound to \sum_k upperbound_k.
Does that make sense?
On Thu, Feb 23, 2023 at 2:26 PM Mauricio Caceres Bravo < @.***> wrote:
@jonathandroth https://github.com/jonathandroth I was thinking about this some more and I think my only confusion really is that l_vec can give you any linear combination of betapost, so what to do in that case?
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1442312845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFDGJIMF3AUKY3UYMT3WY62UPANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
@jonathandroth Is this sum_k k * abs(l_k)
?
I don't think so. Does this clarify:
[image: image.png]
On Mon, Feb 27, 2023 at 10:42 AM Mauricio Caceres Bravo < @.***> wrote:
@jonathandroth https://github.com/jonathandroth Is this sum_k k * abs(l_k)?
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1446563524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFEEXA326VAH5ISMBNDWZTDPVANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
@jonathandroth Just fyi the image didn't update.
EDIT: upload*
@jonathandroth This seemed to imply it I ought to use something like
A = max(abs(diff(c(betapre, 0))))
lb_k = tau_k - k * Mbar * A
ub_k = tau_k + k * Mbar * A
for the case where l_vec is the kth basis?
Yes, if tau_k is the k'th element of betapost, then that looks right. And then you want to add some multiple of the standard error to those lb_k and ub_k.
On Tue, Feb 28, 2023 at 12:17 PM Mauricio Caceres Bravo < @.***> wrote:
@jonathandroth https://github.com/jonathandroth This https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1442295903 seemed to imply it I ought to use something like
A = max(abs(diff(c(betapre, 0)))) lb_k = tau_k - k Mbar A ub_k = tau_k + k Mbar A
for the case where l_vec is the kth basis?
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1448563319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFDSC4BXK2MI6BMUN3DWZYXI5ANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
Yes, if tau_k is the k'th element of betapost, then that looks right. And then you want to add some multiple of the standard error to those lb_k and ub_k.
@jonathandroth Ok, then I think my simplification does work. For a given k, if l_k > 0 we have
lb_k = l_k * tau_k - l_k * k * Mbar * A
= l_k * tau_k - abs(l_k) * k * Mbar * A
ub_k = l_k * tau_k + l_k * k * Mbar * A
= l_k * tau_k + abs(l_k) * k * Mbar * A
and if l_k < 0 we have
lb_k = l_k tau_k + l_k * k * Mbar * A
= l_k tau_k - abs(l_k) * k * Mbar * A
ub_k = l_k tau_k - l_k * k * Mbar * A
= l_k tau_k + abs(l_k) * k * Mbar * A
Either way,
lb_k = l_k tau_k - abs(l_k) * k * Mbar * A
ub_k = l_k tau_k + abs(l_k) * k * Mbar * A
Hence
sum_k lb_k = tau_post * l_vec - Mbar * A * (sum_k abs(l_k) * k)
sum_k ub_k = tau_post * l_vec + Mbar * A * (sum_k abs(l_k) * k)
So this is symmetric and the grid half length is Mbar * A * (sum_k abs(l_k) * k)
plus some multiple of SE(tau_post * l_vec)
(currently 20 times). Is this right?
Looks good to me!
On Fri, Mar 3, 2023 at 12:20 PM Mauricio Caceres Bravo < @.***> wrote:
Yes, if tau_k is the k'th element of betapost, then that looks right. And then you want to add some multiple of the standard error to those lb_k and ub_k.
@jonathandroth https://github.com/jonathandroth Ok, then I think my simplification does work. For a given k, if l_k > 0 we have
lb_k = l_k tau_k - l_k k Mbar A = l_k tau_k - abs(l_k) k Mbar A ub_k = l_k tau_k + l_k k Mbar A = l_k tau_k + abs(l_k) k Mbar A
and if l_k < 0 we have
lb_k = l_k tau_k + l_k k Mbar A = l_k tau_k - abs(l_k) k Mbar A ub_k = l_k tau_k - l_k k Mbar A = l_k tau_k + abs(l_k) k Mbar A
Either way,
lb_k = l_k tau_k - abs(l_k) k Mbar A ub_k = l_k tau_k + abs(l_k) k Mbar A
Hence
sum_k lb_k = tau_post l_vec - Mbar A (sum_k abs(l_k) k) sum_k ub_k = tau_post l_vec + Mbar A (sum_k abs(l_k) k)
So this is symmetric and the grid half length is Mbar A (sum_k abs(l_k) k) plus some multiple of SE(tau_post l_vec) (currently 20 times). Is this right?
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1453853362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFCLCAWN4LKSIDAVTBLW2IR4FANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
@jonathandroth Bet, this is what I'd implemented. In the test script there's fewer instances where the function gives an open-ended CI.
Merged to main in #13.
Great. One slight modification that might work even better is instead of using the max pre-treatment diff, you use the max absolute value of the CI endpoints for the pre-treatment diffs, which also accounts for sampling error in betapre. But if the current approach seems to work well, happy to stick with it.
On Sat, Mar 4, 2023 at 7:31 PM Mauricio Caceres Bravo < @.***> wrote:
@jonathandroth https://github.com/jonathandroth Bet, this is what I'd implemented. In the test script there's fewer instances where the function gives an open-ended CI.
Merged to main in #13 https://github.com/mcaceresb/stata-honestdid/pull/13.
— Reply to this email directly, view it on GitHub https://github.com/mcaceresb/stata-honestdid/issues/11#issuecomment-1454936920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFH5757MBADDHTFOBRDW2PNENANCNFSM6AAAAAAU6HPULE . You are receiving this because you were mentioned.Message ID: @.***>
Dear Authors, Thanks for the excellent scholarship on DiD and the accompanying packages. I had a quick question concerning the output of honestdid, specifically, on what might be the source of missing values in the output. When I run the event-study model with reghdfe on a perfectly balanced panel, the estimates seem reasonable and are what I expect:
gen byte D = (yrtreat == 22797) gen `:type year' Dyear = cond(D, date1, 22796)
reghdfe e10 b22796.Dyear, absorb(intid date1) cluster(intid) noconstant (MWFE estimator converged in 2 iterations)
HDFE Linear regression Number of obs = 111,112 Absorbing 2 HDFE groups F( 7, 13888) = 57597.40 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.9673 Adj R-squared = 0.9626 Within R-sq. = 0.7847 Number of clusters (intid) = 13,889 Root MSE = 0.0217
-------------+---------------------------------------------------------------- Dyear | 22791 | .0070532 .0008249 8.55 0.000 .0054363 .0086701 22792 | .0168851 .0007493 22.53 0.000 .0154164 .0183538 22793 | .0033554 .0006626 5.06 0.000 .0020567 .0046542 22794 | .010514 .0006635 15.85 0.000 .0092135 .0118145 22795 | .0143172 .0005688 25.17 0.000 .0132022 .0154322 22797 | -.3207717 .000594 -539.98 0.000 -.3219361 -.3196073 22798 | -.3188196 .0006578 -484.69 0.000 -.320109 -.3175303
Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| intid | 13889 13889 0 *| date1 | 8 0 8 | -----------------------------------------------------+
But missing values are seen in the following output (this also happens in R):
. honestdid, pre(1/5) post(7/8) mvec(0.5(0.5)2) (note: running execution using -parallel-; see help parallel for details)
(method = C-LF, Delta = DeltaRM, alpha = 0.050)
I realize this is not much to go on, but was just wondering if you had any insights.
Many thanks, Colin