gphk-metrics / stata-multe

Multiple Treatment Effects
17 stars 4 forks source link

Add weights #7

Closed peterdhull closed 1 year ago

peterdhull commented 2 years ago

Per #4, we would like to include an option for weights (I think all of aw, fw, and pw).

@jerrayc and I have started discussing, but think we would need to sit down with the code and go through what calculations need to be weighted and how. So we plan to return to this after v1 of the package is posted with the paper, unless @paulgp / @kolesarm have objections / suggestions for how to do the weighting more automatically/quickly in v1

kolesarm commented 2 years ago

Sounds good

paulgp commented 2 years ago

Sounds goodto me.

On Thu, Apr 28, 2022 at 2:54 PM, Michal Kolesar @.***> wrote:

Sounds good

— Reply to this email directly, view it on GitHub https://github.com/gphk-metrics/stata-multe/issues/7#issuecomment-1112550962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDTDFUYWRCU5TDXAOKUSD3VHLNHXANCNFSM5UTLFTKA . You are receiving this because you were mentioned.Message ID: @.***>

mcaceresb commented 1 year ago

@peterdhull I've added weights in the issue7_weights branch. This didn't occur to me when we last chatted, but there can be a substantive difference in weight type, at least one that didn't sound unreasonable to me.

Frequency weights are usually taken to mean there are really that many copies of each observation; other weight types are not always given this interpretation. I've coded all weights to be the same, and the number of observations is the sum of the weights. However, other Stata functions handle this differently depending on the application. For instance:

sysuse auto, clear
gen w = _n
qui reg price mpg [fw = w]
disp e(N)
qui reg price mpg [aw = w]
disp e(N)

In the first case, Stata reports 2775 observations (same with iw), but in the second only 74 (same with pw). LMK whether this matters. In the meantime, you can try out weights via

cap noi net uninstall multe
local github "https://raw.githubusercontent.com"
net install multe, from(`github'/gphk-metrics/stata-multe/issue7_weights/) replace

local nobs   1000
local ktreat 5
clear
set seed 1729
set obs `nobs'
gen T = ceil(runiform() * `ktreat')
gen W = mod(_n, 10)
gen Y = T + runiform()
gen w = ceil(runiform() * 20)
multe Y T  [w = w], control(W)
expand w
multe Y T, control(W)
peterdhull commented 1 year ago

Thanks @mcaceresb! Good point on the weights. I think for fw the SEs take into account the actual number of observations (so they're much smaller than the ones with usual aw's). We could do this too, but I still think aw should be the default

Will play around with the weights soon. Were you able to get everything to match the canned commands when they were available?

kolesarm commented 1 year ago

I always get confused about the multitude of the weight options in Stata, since as far as I can tell, the only difference in regression settings is in how the sample size gets computed (and so any finite-sample corrections are slightly different). We can follow the Stata practice if the user selects frequency weights, but have aw be the default (if I remember correctly, this normalized the weights to sum to n, after which fweight is applied)

mcaceresb commented 1 year ago

@kolesarm @peterdhull I left aw as the default and took out pweights (mainly because I wasn't entirely sure how it should be different from aweight; happy to put back if you know). So at the moment aw is allowed (weights are rescaled) and fw is allowed (weights are taken to represent a number of observations).

The point estimates are the same, and I did check the variance was off by exactly the square root of the ratio between the sum of the weights and the number of observations. After you've tested it out I think we'd be ready to merge this branch, unless you want to review the code or run any particular tests (in which case LMK).

peterdhull commented 1 year ago

The point estimates are the same, and I did check the variance was off by exactly the square root of the ratio between the sum of the weights and the number of observations. After you've tested it out I think we'd be ready to merge this branch, unless you want to review the code or run any particular tests (in which case LMK).

For this you mean the aw vs fw variance, right? And variance or SE?

Have you checked for aw whether you get the same SEs on the one-at-a-time estimates as, say, using Stata's built-in teffects ipw command? That might also be a good check, as well as whether our regression SEs match the built-in `reg' ones....

mcaceresb commented 1 year ago

@peterdhull Yes I meant the SE.

I hadn't tried any external checks, actually. i just tries the oaat vs reg and I get the same everywhere (up to df adjustment) except for the aw SEs. Sadness.

Will investigate.

PS: Sorry if obvious but how would I use teffects here?

peterdhull commented 1 year ago

`teffects ipw' should give ATE estimates that are equivalent (I believe) to the ones multe produces -- because the covariates are saturated group dummies. Maybe good to first check that before checking the SEs.

Let me know if helpful to chat about the discrepancy in reg SEs

mcaceresb commented 1 year ago

Continued in #13