Closed mcaceresb closed 2 years ago
Thanks @jerrayc and @mcaceresb for the great update! Two quick questions/comments going through it:
What does this message mean: (warning: off-diagonal entries of Vcov matrix hard-coded as 0)
? I get it running the example
I think the "control(varlist)" option is required (i.e. I can't run multe Y T, vce(oracle)
without error). Yet as the help file shows I can just run multe
without any arguments or options, after a previous command. Is the latter standard for stata commands? If not I might not allow it to avoid confusion (i.e. force full specification every time).
I will follow-up with Jerray with a few smaller edits to the help file (which looks great!) and some ideas for how to format the decomposition
@peterdhull
- What does this message mean:
(warning: off-diagonal entries of Vcov matrix hard-coded as 0)
? I get it running the example
Stata's standard output has no way to specify a vector of SEs; it wants a Vcov matrix. The note just means we made a diagonal matrix of SEs rather than a Vcov matrix.
- I think the "control(varlist)" option is required (i.e. I can't run
multe Y T, vce(oracle)
without error). Yet as the help file shows I can just runmulte
without any arguments or options, after a previous command. Is the latter standard for stata commands? If not I might not allow it to avoid confusion (i.e. force full specification every time).
There's two Qs here:
[
should be after the control()
argument.r
results). You can try this with reg
, ivreg
, reghdfe
, and so on.Makes sense on the second thing, thanks
I'm not sure I follow the first thing. We should be able to make a Vcov matrix as usual here, no?
I'm not sure I follow the first thing. We should be able to make a Vcov matrix as usual here, no?
@peterdhull The SEs come from a loop that only computes the diagonal terms, so we'd have to change the code.
While it's probably a straightforward modification (@jerrayc also made the comment that it shouldn't be hard to do), I figured I ought to take a beat to have a closer look at the linear algebra before changing the code beyond adapting it (since I don't yet fully understand all the moving parts).
Ofc if @jerrayc already sees how to do it he can go ahead and take a stab at it!
@mcaceresb @jerrayc I'm not sure exactly how you're currently doing the SE calculations, but if it comes from computing the variances of psi_k,i and psihat_k,i in Michal's notes in a loop over k note that you can get the full variance-covariance matrix in one go with something like corr psi_1 psi_2 psi_3 psi_4, cov
Jerray we can discuss this more on Thursday if its not clear or if I'm misunderstanding how we're currently doing it!
@peterdhull Sounds good, Peter! I have a vague sense of how to compute the Vcov (I'll need to remind myself what Michal's code does), but let's talk through it tomorrow.
I revised the helpfile and added the covariance computation.
We still need to add STAR as an example to the helpfile and include some cleaned STAR data somewhere on git to reference in the helpfile. Not sure what the best way to do that is.
Thanks @jerrayc
For the help file example, check out what we did with the "ssaggreagte" command for SSIV (install with "ssc install ssaggreagte" then "help ssaggreagte", if you don't already have it). We can probably do something similar here
@peterdhull @jerrayc I added the star file to the package. FYI this is not installed by default; you need to add the all
option to download .dta
files that are included as part of packages (e.g. ssc install ssaggregate, all
)
@jerrayc
po_vcov
and or_vcov
are used in MulTE_Estimates::post
. You can see this example:mata
class Testing
{
real matrix test
void hello()
void bye()
}
void function Testing::hello()
{
printf("hello, world!")
test
}
void function Testing::bye()
{
printf("bye, world!")
this.test
}
foo = Testing()
foo.hello()
foo.bye()
foo.test = (1, 2) \ (3, 4)
foo.hello()
foo.bye()
end
You can see in the first case, test
is missing. This is because objects are initialized to missing in a class. You can also see that test
used within the class is really short-hand for this.test
, which makes it explicit that the object being used is the one stored in "this" instance of the class. Once test
is assigned to a matrix all is well. The same is the case with po_vcov
and or_vcov
.
@mcaceresb Great, that all sounds good. I added some example code to the helpfile for the STAR example. Now I think we should be good to PR (we can always iterate on the helpfile in the follow up if need be).
Re: mata classes-- interesting! I had thought it was the plain "po_vcov" and "or_vcov" objects inside the MulTE() function that were being used in MulTE_Estimates::post(), but it was actually the class objects "results.estimates.po_vcov" and "results.estimates.or_vcov". I had defined those class objects to try and print them easier for debugging; I didn't realize they were necessary to call the objects in MulTE_Estimates::post(). Good to know, thanks!
for those of us who are less cool, what's PR?
On Thu, Apr 21, 2022, 21:11 jerrayc @.***> wrote:
@mcaceresb https://github.com/mcaceresb Great, that all sounds good. I added some example code to the helpfile for the STAR example. Now I think we should be good to PR (we can always iterate on the helpfile in the follow up if need be).
Re: mata classes-- interesting! I had thought it was the plain "po_vcov" and "or_vcov" objects inside the MulTE() function that were being used in MulTE_Estimates::post(), but it was actually the class objects "results.estimates.po_vcov" and "results.estimates.or_vcov". I had defined those class objects to try and print them easier for debugging; I didn't realize they were necessary to call the objects in MulTE_Estimates::post(). Good to know, thanks!
— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1105906261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLS2VVQQA6A2PZDU6ALVGH4D3ANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>
PR = Pull Request
(AKA we will review our changes and if everything looks good we will merge and squash this branch, issue4_follow_ups
, into the main branch. That's what we've done for the previous iterations of the package to keep changes nice and organized.)
I hope this provides a positive treatment effect on your coolness!
thanks! Doubtful but you never know
On Thu, Apr 21, 2022, 21:24 jerrayc @.***> wrote:
PR = Pull Request
(AKA we will review our changes and if everything looks good we will merge and squash this branch, issue4_follow_ups, into the main branch. That's what we've done for the previous iterations of the package to keep changes nice and organized.)
I hope this provides a positive treatment effect on your coolness!
— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1105911899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLRTZCG4OJZXKIWIIN3VGH5S7ANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>
@peterdhull Before we publish the package on the SSC archive, @mcaceresb and I wanted to check the following with you:
Yes, let me try it a few times — I was playing around with it early this week and definitely ran into some slowness on big datasets (ran it on the ACS for 2015, for eg, which is many millions of observations).
@mkolesar and @peterdhull, it would be straightforward to accommodate weights, right? This came up when I was doing ACS stuff, and could show up in other settings. is that functionality we could add?
On Fri, Apr 22, 2022 at 1:31 PM, jerrayc @.***> wrote:
@peterdhull https://github.com/peterdhull Before we publish the package on the SSC archive, @mcaceresb https://github.com/mcaceresb and I wanted to check the following with you:
- Do you want any post-estimation commands?
- The package has not been optimized and may be inefficient in moderately-sized data. Do you want us to work on improving this before publishing?
- Once we merge issue4_follow_ups into main, do you/Michal/Paul want to take it for a spin again and weigh in one more time before we publish it on SSC?
— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106720346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDTDFUID7F2DJVHOCSXFWDVGLPATANCNFSM5RSIXEKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
It does seem to me like the theory should all go through easily with weights but I defer to @kolesarm. Would be good to add if straightforward, yes
@mcaceresb did you mean optimizing for a large number of observations or optimizing for a larger number of treatments/controls? The former seems good but I'm less sure about the latter -- my guess is overlap is going to be a problem as the number of treatments/controls grows and that's a bigger problem than speed. But I guess if it's easy to address both we should.
Can you let me know when you've merged into main?
On Fri, Apr 22, 2022 at 1:35 PM, Paul Goldsmith-Pinkham @.***> wrote:
Yes, let me try it a few times — I was playing around with it early this week and definitely ran into some slowness on big datasets (ran it on the ACS for 2015, for eg, which is many millions of observations).
@mkolesar and @peterdhull, it would be straightforward to accommodate weights, right? This came up when I was doing ACS stuff, and could show up in other settings. is that functionality we could add?
On Fri, Apr 22, 2022 at 1:31 PM, jerrayc @.***> wrote:
@peterdhull https://github.com/peterdhull Before we publish the package on the SSC archive, @mcaceresb https://github.com/mcaceresb and I wanted to check the following with you:
- Do you want any post-estimation commands?
- The package has not been optimized and may be inefficient in moderately-sized data. Do you want us to work on improving this before publishing?
- Once we merge issue4_follow_ups into main, do you/Michal/Paul want to take it for a spin again and weigh in one more time before we publish it on SSC?
— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106720346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDTDFUID7F2DJVHOCSXFWDVGLPATANCNFSM5RSIXEKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
oh and I don't think there are any obvious post-estimation commands right now so seems fine to not have those. We can add them if they become clearer later.
@peterdhull Re:
Star data: Could you clarify how you want to include it?
I know I said I had done it, but I might not have understood what you meant. I was checking ssc install ssaggregate, all
and it doesn't install location_level.dta
. ssc describe ssaggregate
doesn't list any .dta
files either. So I am actually not sure how the data was included with ssaggregate
. LMK and sorry for the oversight.
I don't think we included the data in the ssaggregate command on ssc, we just linked to the repo in the help file, which contained data. That seems fine to me
On Fri, Apr 22, 2022, 17:32 Mauricio Caceres Bravo @.***> wrote:
@peterdhull https://github.com/peterdhull Re:
-
Speed. I mean for many observations. I noticed the command was slow and made a note to try to optimize after the follow-ups point. It seems there is agreement this could be valuable, so I will do this once we wrap up this issue.
Post-estimation. Roger; will leave aside for now.
Star data: Could you clarify how you want to include it?
I know I said I had done it, but I might not have understood what you meant. I was checking ssc install ssaggregate, all and it doesn't install location_level.dta. ssc describe ssaggregate doesn't list any .dta files either. So I am actually not sure how the data was included with ssaggregate. LMK and sorry for the oversight.
— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106885530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLRIQTLVULS33URS2N3VGMLFXANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>
I don't see an issue with allowing for weights
If the dataset is small, which it is, I why not add include it directly? It'll make it marginally easier if people want to replicate what we do
agreed! We can just add the dataset directly, unlike what we did with ssaggregate. Sound good @mcaceresb?
Great. @mcaceresb added the dataset directly and refers to it in the helpfile. In addition, he also provided a link to download the data from the online repository in the helpfile.
What's left to do are the following:
Would it be helpful for you to start playing around with the package given the updates we've made in this issue, or would you rather just wait until the speed improvements and weight options are added?
For the former, we can squash and merge this issue first and work on the speed and weight improvements in a subsequent issue.
For the latter, we can wait until after we've implemented the speed and weight improvements to squash and merge this issue.
I vote for squashing + merging this issue first and working on the speed and weight improvements in a subsequent issue. @jerrayc we can chat about weights in our meeting tomorrow
The changes from issue4_follow_ups
have been squashed and merged into main
. All tasks outlined here were completed. We will open a new issue for adding weights and improving speed.
@mcaceresb and I have added text that we plan to submit to SSC, located at "doc/multe-ssc.txt"; please let us know if you have any feedback on the text in this file, particularly:
Feel free to take it for a spin and let us know if you have any feedback!
main
.