Adapt MulTE from R - Githubissues

mcaceresb commented 9 months ago

[x] Copy conventions and functionality from kolesarm/multe
[x] Add unit tests for new behavior and switches
[X] Change README to reflect current status (current function I/O and current paper citation)
[x] Update help file to reflect current options.
[x] Re-implement interacted regression so it fails if there are collinear covariates (Stata insists on dropping collinear covariates by default).

mcaceresb commented 9 months ago

EDIT: I mean this branch: https://github.com/gphk-metrics/stata-multe/tree/issue14_copyR

@peterdhull First pass in issue branch. Example:

. use test/example_fryer_levitt.dta, clear

. multe std_iq_24 i.age_24 female [w=W2C0], treat(race) strat(SES_quintile)
(analytic weights assumed)

             |      PL      OWN      ATE       EW       CW 
-------------+---------------------------------------------
       Black |  -.2574   -.2482   -.2655    -.255   -.2604 
          SE |  .02812   .02906   .02983   .02888   .02925 
    Hispanic |  -.2931   -.2829   -.2992   -.2862   -.2944 
          SE |  .02596   .02673   .02988    .0268   .02792 
       Asian |  -.2621   -.2609   -.2599   -.2611   -.2694 
          SE |  .03426   .03432   .04177   .03433   .04751 
       Other |  -.1563   -.1448   -.1503   -.1447   -.1522 
          SE |  .03691   .03696   .03594   .03684   .03698

Some Qs for you and @kolesarm:

If there's a control with no variation within a treatment level (check here) then R gives a NA for those estimtaes in the interacted model. However, it still gives a result for the multinomial logit, which is wrong, right? In that case the coefficient is not identified, but nnet::multinom gives an answer. What should I do in this instance? (Stata loops without finding an answer.)
If there's a real zero here shouldn't it set the result to missing?
There isn't a reason not to return the full vcov too, right? It looks straightforward but if there's a nuance I might be missing LMK.

peterdhull commented 9 months ago

Thanks @mcaceresb! I don't see a reason why we shouldn't return the full vcov. I'll let @kolesarm weigh in on the other two questions.

kolesarm commented 9 months ago

if there is a control with no variation within a treatment level, then the multinomial logit fitted probabilities are generally still identified, it's just that the multinomial logit coefficients are not identified. We only need the fitted probabilities for the CW estimator (ml$fitted.values here) , and not the coefficients. Any value of the coefficient that maximizes the likelihood should be fine. But if it's hard to get stata to find some values of coefficients that maximize the likelihood, it is probably ok to issue a warning and not report the CW estimator
if there is a real zero, lam should be zero, so any value of ipi should give the same answer, since ipi gets multiplied by lam. I set it to something arbitrary to make sure the calculation goes through.
Not sure which vcov you mean, there are 5 estimators. But ok to return vcov associated with the PL estimator

mcaceresb commented 8 months ago

@kolesarm

Right, that sounds good. If there are omitted covariates I could also check for convergence in the fitted values? I'd have to introduce some options or code manually.
Yes, I see this. Thanks for clarifying.
I thought all of them? I was thinking of making it so the user could request a standard Stata regression-style table for any of the estimators so they could alternate as they liked. Would there be an issue with doing this?

kolesarm commented 8 months ago

If you can check for convergence of fitted values, that'd be excellent
great
sounds great

mcaceresb commented 8 months ago

@peterdhull This should be ready to go functionality-wise. What's left is updating the docs and adding some (basic) tests. Current output:

. use test/example_fryer_levitt.dta, clear

. multe std_iq_24 i.age_24 female [w=W2C0], treat(race) strat(SES_quintile)
(analytic weights assumed)

PL Estimates (full sample)                      Number of obs     =      8,806

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      Black  |  -.2574154   .0281244    -9.15   0.000    -.3125383   -.2022925
   Hispanic  |  -.2931465    .025962   -11.29   0.000    -.3440311    -.242262
      Asian  |   -.262108    .034262    -7.65   0.000    -.3292603   -.1949557
      Other  |  -.1563373   .0369127    -4.24   0.000    -.2286848   -.0839898
------------------------------------------------------------------------------

Alternative Estimates on Full Sample:

             |      PL      OWN      ATE       EW       CW 
-------------+---------------------------------------------
       Black |  -.2574   -.2482   -.2655    -.255   -.2604 
          SE |  .02812   .02906   .02983   .00667   .02925 
    Hispanic |  -.2931   -.2829   -.2992   -.2862   -.2944 
          SE |  .02596   .02673   .02988   .00483   .02792 
       Asian |  -.2621   -.2609   -.2599   -.2611   -.2694 
          SE |  .03426   .03432   .04177   .00331   .04751 
       Other |  -.1563   -.1448   -.1503   -.1447   -.1522 
          SE |  .03691   .03696   .03594   .00494   .03698 

P-values for null hypothesis of no propensity score variation:
Wald test:  2.1e-188
  LM test:  8.8e-197

Note: You can post any combination of results from the table to Stata:

    multe, est(estimate) [{full|overlap} diff oracle]

Examples:

    multe, est(ATE) full oracle
    multe, est(CW)  overlap diff

mcaceresb commented 8 months ago

@kolesarm I copied the examples from the vignette into the README here, LMK if the plagiarism is fine (I do say I copy it from your vignette).

@peterdhull Sorry I actually still have a bit of work to do here; I just realized that Stata is dropping collinear covariate levels from the interacted specification, meaning it's the same in the full sample and overlap samples. I can just implement it myself to force it to fail in that case.

kolesarm commented 8 months ago

Plagiarism is fine, it's actually better to have the same examples (and say that).

Output looks good, but I agree, you need to force it to fail rather than silently drop collinear covariate levels for the ATE.

mcaceresb commented 8 months ago

Continued in #15

gphk-metrics / stata-multe

Adapt MulTE from R #14