document, potentially refine how `epb::sandwich` sets `n`

benthestatistician commented 8 years ago

With cluster random assignment, the effective sample size is more nearly determined by the number of clusters than by the number of elements within clusters. Statisticians who are aware of this should expect to see it reflected in the calculations. However, in R/sandwich.R as it stands now (f7fb70b on Mar 22), the clusters don't appear to be informing the scaling constants: n is being defined as n <- NROW(sandwich::estfun(x)), just as if there were no clusters.

Perhaps this is as it should be: assuming that bread() will have scaled its A matrices by the reciprocal of the number of elements, then this mischief is undone by premultiplying both the bread matrix and the $A^{-1} B A^{-t}$ sandwich itself by that same factor; for software purposes it's best to stick closely to the sandwich package's API. It would be helpful to the end user to leave a trail of breadcrumbs (so to speak) leading to this conclusion.

Relatedly, I don't imagine that the n/(n - k) degrees of freedom adjustment to the bread is the correct one when there are clusters and n refers to the number of elements. The simplest or most accepted cluster-aware alternative that presents itself in (your nonrandom sample of) the cluster-robust standard error literature would be an improvement here, if only for the purpose of emphasizing to the reader of the code that the code really is cluster-aware. (If there's no one proposal for simplest or most accepted cluster-aware d.f. adjustment, pick one.)

josherrickson commented 8 years ago

The 1/NROW(sandwich::estfun(x)) scaling factor is as you described in the second paragraph. It is also the default in sandwich::sandwich. I will work on documenting that.

The n/(n - k) adjustment is indeed not the best choice. I had played around with several variations of adjustments earlier, and saw no difference, so stuck with the sandwich default. Now that I've fixed up the cluster simulations (namely by using a large enough number of clusters), I see improvements in coverage using C/(C - 1) * (n - 1)/(n - k) where C is the number of clusters. Note that this simplifies to n/(n-k) if there are no clusters (as C = n, a programatic definition and not statistical).

benthestatistician commented 8 years ago

The documentation I'm suggesting needn't be extensive, particularly if you're also adjusting that adjustment constant to reflect clustering. In fact, if you're adjusting that thing in accord with somebody's recommendations from somebody's paper, then a citation to that paper in support of your clustering adjustments would do fine.

josherrickson / pbph

document, potentially refine how `epb::sandwich` sets `n` #3