gphk-metrics / stata-multe

Multiple Treatment Effects
21 stars 5 forks source link

Refinement follow-ups #4

Closed mcaceresb closed 2 years ago

mcaceresb commented 2 years ago
peterdhull commented 2 years ago

Thanks @jerrayc and @mcaceresb for the great update! Two quick questions/comments going through it:

I will follow-up with Jerray with a few smaller edits to the help file (which looks great!) and some ideas for how to format the decomposition

mcaceresb commented 2 years ago

@peterdhull

  • What does this message mean: (warning: off-diagonal entries of Vcov matrix hard-coded as 0)? I get it running the example

Stata's standard output has no way to specify a vector of SEs; it wants a Vcov matrix. The note just means we made a diagonal matrix of SEs rather than a Vcov matrix.

  • I think the "control(varlist)" option is required (i.e. I can't run multe Y T, vce(oracle) without error). Yet as the help file shows I can just run multe without any arguments or options, after a previous command. Is the latter standard for stata commands? If not I might not allow it to avoid confusion (i.e. force full specification every time).

There's two Qs here:

peterdhull commented 2 years ago

Makes sense on the second thing, thanks

I'm not sure I follow the first thing. We should be able to make a Vcov matrix as usual here, no?

mcaceresb commented 2 years ago

I'm not sure I follow the first thing. We should be able to make a Vcov matrix as usual here, no?

@peterdhull The SEs come from a loop that only computes the diagonal terms, so we'd have to change the code.

While it's probably a straightforward modification (@jerrayc also made the comment that it shouldn't be hard to do), I figured I ought to take a beat to have a closer look at the linear algebra before changing the code beyond adapting it (since I don't yet fully understand all the moving parts).

Ofc if @jerrayc already sees how to do it he can go ahead and take a stab at it!

peterdhull commented 2 years ago

@mcaceresb @jerrayc I'm not sure exactly how you're currently doing the SE calculations, but if it comes from computing the variances of psi_k,i and psihat_k,i in Michal's notes in a loop over k note that you can get the full variance-covariance matrix in one go with something like corr psi_1 psi_2 psi_3 psi_4, cov

Jerray we can discuss this more on Thursday if its not clear or if I'm misunderstanding how we're currently doing it!

jerrayc commented 2 years ago

@peterdhull Sounds good, Peter! I have a vague sense of how to compute the Vcov (I'll need to remind myself what Michal's code does), but let's talk through it tomorrow.

jerrayc commented 2 years ago

I revised the helpfile and added the covariance computation.

We still need to add STAR as an example to the helpfile and include some cleaned STAR data somewhere on git to reference in the helpfile. Not sure what the best way to do that is.

peterdhull commented 2 years ago

Thanks @jerrayc

For the help file example, check out what we did with the "ssaggreagte" command for SSIV (install with "ssc install ssaggreagte" then "help ssaggreagte", if you don't already have it). We can probably do something similar here

mcaceresb commented 2 years ago

@peterdhull @jerrayc I added the star file to the package. FYI this is not installed by default; you need to add the all option to download .dta files that are included as part of packages (e.g. ssc install ssaggregate, all)

@jerrayc

mata
class Testing
{
    real matrix test
    void hello()
    void bye()
}
void function Testing::hello()
{
    printf("hello, world!")
    test
}
void function Testing::bye()
{
    printf("bye, world!")
    this.test
}
foo = Testing()
foo.hello()
foo.bye()
foo.test = (1, 2) \ (3, 4)
foo.hello()
foo.bye()
end

You can see in the first case, test is missing. This is because objects are initialized to missing in a class. You can also see that test used within the class is really short-hand for this.test, which makes it explicit that the object being used is the one stored in "this" instance of the class. Once test is assigned to a matrix all is well. The same is the case with po_vcov and or_vcov.

jerrayc commented 2 years ago

@mcaceresb Great, that all sounds good. I added some example code to the helpfile for the STAR example. Now I think we should be good to PR (we can always iterate on the helpfile in the follow up if need be).

Re: mata classes-- interesting! I had thought it was the plain "po_vcov" and "or_vcov" objects inside the MulTE() function that were being used in MulTE_Estimates::post(), but it was actually the class objects "results.estimates.po_vcov" and "results.estimates.or_vcov". I had defined those class objects to try and print them easier for debugging; I didn't realize they were necessary to call the objects in MulTE_Estimates::post(). Good to know, thanks!

peterdhull commented 2 years ago

for those of us who are less cool, what's PR?

On Thu, Apr 21, 2022, 21:11 jerrayc @.***> wrote:

@mcaceresb https://github.com/mcaceresb Great, that all sounds good. I added some example code to the helpfile for the STAR example. Now I think we should be good to PR (we can always iterate on the helpfile in the follow up if need be).

Re: mata classes-- interesting! I had thought it was the plain "po_vcov" and "or_vcov" objects inside the MulTE() function that were being used in MulTE_Estimates::post(), but it was actually the class objects "results.estimates.po_vcov" and "results.estimates.or_vcov". I had defined those class objects to try and print them easier for debugging; I didn't realize they were necessary to call the objects in MulTE_Estimates::post(). Good to know, thanks!

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1105906261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLS2VVQQA6A2PZDU6ALVGH4D3ANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>

jerrayc commented 2 years ago

PR = Pull Request

(AKA we will review our changes and if everything looks good we will merge and squash this branch, issue4_follow_ups, into the main branch. That's what we've done for the previous iterations of the package to keep changes nice and organized.)

I hope this provides a positive treatment effect on your coolness!

peterdhull commented 2 years ago

thanks! Doubtful but you never know

On Thu, Apr 21, 2022, 21:24 jerrayc @.***> wrote:

PR = Pull Request

(AKA we will review our changes and if everything looks good we will merge and squash this branch, issue4_follow_ups, into the main branch. That's what we've done for the previous iterations of the package to keep changes nice and organized.)

I hope this provides a positive treatment effect on your coolness!

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1105911899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLRTZCG4OJZXKIWIIN3VGH5S7ANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>

jerrayc commented 2 years ago

@peterdhull Before we publish the package on the SSC archive, @mcaceresb and I wanted to check the following with you:

  1. Do you want any post-estimation commands?
  2. The package has not been optimized and may be inefficient in moderately-sized data. Do you want us to work on improving this before publishing?
  3. Once we merge issue4_follow_ups into main, do you/Michal/Paul want to take it for a spin again and weigh in one more time before we publish it on SSC?
paulgp commented 2 years ago

Yes, let me try it a few times — I was playing around with it early this week and definitely ran into some slowness on big datasets (ran it on the ACS for 2015, for eg, which is many millions of observations).

@mkolesar and @peterdhull, it would be straightforward to accommodate weights, right? This came up when I was doing ACS stuff, and could show up in other settings. is that functionality we could add?

On Fri, Apr 22, 2022 at 1:31 PM, jerrayc @.***> wrote:

@peterdhull https://github.com/peterdhull Before we publish the package on the SSC archive, @mcaceresb https://github.com/mcaceresb and I wanted to check the following with you:

  1. Do you want any post-estimation commands?
  2. The package has not been optimized and may be inefficient in moderately-sized data. Do you want us to work on improving this before publishing?
  3. Once we merge issue4_follow_ups into main, do you/Michal/Paul want to take it for a spin again and weigh in one more time before we publish it on SSC?

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106720346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDTDFUID7F2DJVHOCSXFWDVGLPATANCNFSM5RSIXEKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

peterdhull commented 2 years ago

It does seem to me like the theory should all go through easily with weights but I defer to @kolesarm. Would be good to add if straightforward, yes

@mcaceresb did you mean optimizing for a large number of observations or optimizing for a larger number of treatments/controls? The former seems good but I'm less sure about the latter -- my guess is overlap is going to be a problem as the number of treatments/controls grows and that's a bigger problem than speed. But I guess if it's easy to address both we should.

paulgp commented 2 years ago

Can you let me know when you've merged into main?

On Fri, Apr 22, 2022 at 1:35 PM, Paul Goldsmith-Pinkham @.***> wrote:

Yes, let me try it a few times — I was playing around with it early this week and definitely ran into some slowness on big datasets (ran it on the ACS for 2015, for eg, which is many millions of observations).

@mkolesar and @peterdhull, it would be straightforward to accommodate weights, right? This came up when I was doing ACS stuff, and could show up in other settings. is that functionality we could add?

On Fri, Apr 22, 2022 at 1:31 PM, jerrayc @.***> wrote:

@peterdhull https://github.com/peterdhull Before we publish the package on the SSC archive, @mcaceresb https://github.com/mcaceresb and I wanted to check the following with you:

  1. Do you want any post-estimation commands?
  2. The package has not been optimized and may be inefficient in moderately-sized data. Do you want us to work on improving this before publishing?
  3. Once we merge issue4_follow_ups into main, do you/Michal/Paul want to take it for a spin again and weigh in one more time before we publish it on SSC?

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106720346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDTDFUID7F2DJVHOCSXFWDVGLPATANCNFSM5RSIXEKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

peterdhull commented 2 years ago

oh and I don't think there are any obvious post-estimation commands right now so seems fine to not have those. We can add them if they become clearer later.

mcaceresb commented 2 years ago

@peterdhull Re:

peterdhull commented 2 years ago

I don't think we included the data in the ssaggregate command on ssc, we just linked to the repo in the help file, which contained data. That seems fine to me

On Fri, Apr 22, 2022, 17:32 Mauricio Caceres Bravo @.***> wrote:

@peterdhull https://github.com/peterdhull Re:

-

Speed. I mean for many observations. I noticed the command was slow and made a note to try to optimize after the follow-ups point. It seems there is agreement this could be valuable, so I will do this once we wrap up this issue.

Post-estimation. Roger; will leave aside for now.

Star data: Could you clarify how you want to include it?

I know I said I had done it, but I might not have understood what you meant. I was checking ssc install ssaggregate, all and it doesn't install location_level.dta. ssc describe ssaggregate doesn't list any .dta files either. So I am actually not sure how the data was included with ssaggregate. LMK and sorry for the oversight.

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/4#issuecomment-1106885530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFFQXLRIQTLVULS33URS2N3VGMLFXANCNFSM5RSIXEKA . You are receiving this because you were mentioned.Message ID: @.***>

kolesarm commented 2 years ago

I don't see an issue with allowing for weights

If the dataset is small, which it is, I why not add include it directly? It'll make it marginally easier if people want to replicate what we do

peterdhull commented 2 years ago

agreed! We can just add the dataset directly, unlike what we did with ssaggregate. Sound good @mcaceresb?

jerrayc commented 2 years ago

Great. @mcaceresb added the dataset directly and refers to it in the helpfile. In addition, he also provided a link to download the data from the online repository in the helpfile.

What's left to do are the following:

Would it be helpful for you to start playing around with the package given the updates we've made in this issue, or would you rather just wait until the speed improvements and weight options are added?

For the former, we can squash and merge this issue first and work on the speed and weight improvements in a subsequent issue.

For the latter, we can wait until after we've implemented the speed and weight improvements to squash and merge this issue.

peterdhull commented 2 years ago

I vote for squashing + merging this issue first and working on the speed and weight improvements in a subsequent issue. @jerrayc we can chat about weights in our meeting tomorrow

jerrayc commented 2 years ago

The changes from issue4_follow_ups have been squashed and merged into main. All tasks outlined here were completed. We will open a new issue for adding weights and improving speed.

@mcaceresb and I have added text that we plan to submit to SSC, located at "doc/multe-ssc.txt"; please let us know if you have any feedback on the text in this file, particularly:

Feel free to take it for a spin and let us know if you have any feedback!