jwbowers / TestStatRIInterference

Test statistic selection for randomization inference with interference
1 stars 0 forks source link

Response to Reviewers: Simulation to enhance intuition about the SSR versus KS test #4

Closed jwbowers closed 8 years ago

jwbowers commented 8 years ago

I ran a quick simulation to respond to the request for a simulation and/or intuition about the KS versus SSR comparison. I compared how the KS and the simple SSR performed with a Normal outcome with a Zero Inflated outcome, and a constant additive effects model with a constant multiplicative effects model. In this case, I did not use any network information. The idea was to talk about these two test statistics under different models of effects and outcome distributions in the case of no interference to respond to the reviewers and editor.

Here is just a quick overview of the results. I'm tempted to just talk about them and say that the files are in the github if people are interested in pursuing the question. Basically they show what we already said: the SSR is more powerful at detecting effects which shift the mean. They also show that the KS test is (slightly) more powerful when the outcomes are zero-inflated and long tailed. When the actual model of effects does not target the mean as much as the shape of the distribution we see, again, that the KS has (much) more power (although the alternatives need to be changed in that simulation, I think).

Details of the simulations

In all simulations the truth is no effects. The outcomes are standardized to have sd=1 and centered with mean=0. N=246, N/2 assigned to treatment via complete randomization. (i.e. basically the original setup but no network information and different outcomes). All p-values arise from 1000 draws from the randomization distribution. The power calculations show the proportion of p<=.05 over 1000 repetitions of the procedure. (i.e. draw from the randomization distribution for Z, then draw 1000 times to get a p-value, repeat).

## Make two outcomes in control that both have 0 mean and 1 sd but with different distributions
set.seed(20151130)
tmpzif<-rgeom(n,prob=.7)
y0zif<-as.vector(scale(tmpzif))
y0norm<-rnorm(n)

We will notice that all procedures are unbiased tests in Rosenbaum's sense (i.e. size of the test at the truth is less than or equal to the level of the test (.05)).

Constant Additive Effects, Normal Outcomes

constant.additive.model<- UniformityModel(
function (y, z, tau) {
    y - z * tau
},
function (y_0, z, tau){
    y_0 + z * tau
})

The SSR has more power at all alternatives.

> t(powComparison[powComparison[,"resultsCOANormKS"]<1,])
                  -0.6566 -0.5556 -0.4545 -0.3535 -0.2525 -0.1515 -0.05051     0 0.05051 0.1515 0.2525 0.3535 0.4545 0.5556 0.6566
resultsCOANormKS    0.997   0.964   0.842   0.588   0.300   0.098    0.037 0.028   0.024  0.098  0.307  0.572  0.843  0.962  0.994
resultsCOANormSSR   0.999   0.996   0.943   0.755   0.437   0.169    0.060 0.052   0.058  0.172  0.444  0.758  0.926  0.990  0.999

Constant Additive Effects, Zero Inflated Outcomes

KS has just a bit more power although it is conservative at the truth compared to the SSR. Also, worth mentioning the simulation error arising from the choice of 1000 for p-values and 1000 for power.

resultsCOAZifKS     1.000   1.000   1.000   1.000   1.000   1.000    1.000 0.016   1.000  1.000  1.000  1.000  1.000  1.000  1.000
resultsCOAZifSSR    0.998   0.978   0.924   0.732   0.412   0.152    0.093 0.045   0.096  0.137  0.407  0.730  0.921  0.989  0.998

Constant Multiplicative Effects, Normal Outcomes

constant.multiplicative.model <- UniformityModel( function(y, z, tau) {
                                                     if(tau==0){
                                                         return(rep(0,length(y)))
                                                     } else {
                                                         y / ( 1 + z * ( tau-1 ) )
                                                     }
}, function(y_0, z, tau) { y_0 * ( 1 + z * ( tau-1 ) )})

Here KS has more power. Seems like the SSR has no power here. The particular simulation needs to be changed because tau means something different in the multiplicative effects model. i.e. tau=1 is no effects. So, this is an extreme set of results right now. I leave it here, though, since it looks like the SSR test has poor power in this case (which I aimed to generate --- by using a model that did not directly target location --- but I am still surprised at this results and wonder about whether the constant.multiplicative.model should be changed. I had thought about trying a quantile displacement effect,too, but multiplicative seemed faster to code up.

resultsCOMNormKS    1.000   1.000   1.000   1.000   1.000   1.000    1.000 0.000   1.000  1.000  1.000  1.000  1.000  1.000  1.000
resultsCOMNormSSR   0.042   0.042   0.042   0.042   0.042   0.042    0.042 0.000   0.042  0.042  0.042  0.042  0.042  0.042  0.042

Constant Multiplicative Model, Zero Inflated Outcomes

Same as above.

resultsCOMZifKS     1.000   1.000   1.000   1.000   1.000   1.000    1.000 0.000   1.000  1.000  1.000  1.000  1.000  1.000  1.000
resultsCOMZifSSR    0.007   0.007   0.007   0.007   0.007   0.007    0.007 0.000   0.007  0.007  0.007  0.007  0.007  0.007  0.007
pmaronow commented 8 years ago

"When the actual model of effects does not target the mean" -- not just not targeting, the CEF is totally flat under your multiplicative scaling since y0's distribution is symmetric about zero.

Can you add 10 to y0? I am curious about power there. My guess is that SSR will have very good power.

And yes, SSR will only be sensitive to variation induced in the CEF. (Given that it's operating on the L2 norm.) Ergo, our proposal to have something like an E-statistic if you're interested in other distributional shifts (as it's essentially a multidimensional generalization of the KS stat).

On Fri, Dec 4, 2015 at 10:15 AM, Jake Bowers notifications@github.com wrote:

I ran a quick simulation to respond to the request for a simulation and/or intuition about the KS versus SSR comparison. I compared how the KS and the simple SSR performed with a Normal outcome with a Zero Inflated outcome, and a constant additive effects model with a constant multiplicative effects model. In this case, I did not use any network information. The idea was to talk about these two test statistics under different models of effects and outcome distributions in the case of no interference to respond to the reviewers and editor.

Here is just a quick overview of the results. I'm tempted to just talk about them and say that the files are in the github if people are interested in pursuing the question. Basically they show what we already said: the SSR is more powerful at detecting effects which shift the mean. They also show that the KS test is (slightly) more powerful when the outcomes are zero-inflated and long tailed. When the actual model of effects does not target the mean as much as the shape of the distribution we see, again, that the KS has (much) more power (although the alternatives need to be changed in that simulation, I think). Details of the simulations

In all simulations the truth is no effects. The outcomes are standardized to have sd=1 and centered with mean=0. N=246, N/2 assigned to treatment via complete randomization. (i.e. basically the original setup but no network information and different outcomes). All p-values arise from 1000 draws from the randomization distribution. The power calculations show the proportion of p<=.05 over 1000 repetitions of the procedure. (i.e. draw from the randomization distribution for Z, then draw 1000 times to get a p-value, repeat).

Make two outcomes in control that both have 0 mean and 1 sd but with different distributions

set.seed(20151130) tmpzif<-rgeom(n,prob=.7) y0zif<-as.vector(scale(tmpzif)) y0norm<-rnorm(n)

We will notice that all procedures are unbiased tests in Rosenbaum's sense (i.e. size of the test at the truth is less than or equal to the level of the test (.05)). Constant Additive Effects, Normal Outcomes

constant.additive.model<- UniformityModel( function (y, z, tau) { y - z * tau }, function (y_0, z, tau){ y_0 + z * tau })

The SSR has more power at all alternatives.

t(powComparison[powComparison[,"resultsCOANormKS"]<1,]) -0.6566 -0.5556 -0.4545 -0.3535 -0.2525 -0.1515 -0.05051 0 0.05051 0.1515 0.2525 0.3535 0.4545 0.5556 0.6566 resultsCOANormKS 0.997 0.964 0.842 0.588 0.300 0.098 0.037 0.028 0.024 0.098 0.307 0.572 0.843 0.962 0.994 resultsCOANormSSR 0.999 0.996 0.943 0.755 0.437 0.169 0.060 0.052 0.058 0.172 0.444 0.758 0.926 0.990 0.999

Constant Additive Effects, Zero Inflated Outcomes

KS has just a bit more power although it is conservative at the truth compared to the SSR. Also, worth mentioning the simulation error arising from the choice of 1000 for p-values and 1000 for power.

resultsCOAZifKS 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.016 1.000 1.000 1.000 1.000 1.000 1.000 1.000 resultsCOAZifSSR 0.998 0.978 0.924 0.732 0.412 0.152 0.093 0.045 0.096 0.137 0.407 0.730 0.921 0.989 0.998

Constant Multiplicative Effects, Normal Outcomes

constant.multiplicative.model <- UniformityModel( function(y, z, tau) { if(tau==0){ return(rep(0,length(y))) } else { y / ( 1 + z * ( tau-1 ) ) } }, function(y_0, z, tau) { y_0 * ( 1 + z * ( tau-1 ) )})

Here KS has more power. Seems like the SSR has no power here. The particular simulation needs to be changed because tau means something different in the multiplicative effects model. i.e. tau=1 is no effects. So, this is an extreme set of results right now. I leave it here, though, since it looks like the SSR test has poor power in this case (which I aimed to generate --- by using a model that did not directly target location --- but I am still surprised at this results and wonder about whether the constant.multiplicative.model should be changed. I had thought about trying a quantile displacement effect,too, but multiplicative seemed faster to code up.

resultsCOMNormKS 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 resultsCOMNormSSR 0.042 0.042 0.042 0.042 0.042 0.042 0.042 0.000 0.042 0.042 0.042 0.042 0.042 0.042 0.042

Constant Multiplicative Model, Zero Inflated Outcomes

Same as above.

resultsCOMZifKS 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 resultsCOMZifSSR 0.007 0.007 0.007 0.007 0.007 0.007 0.007 0.000 0.007 0.007 0.007 0.007 0.007 0.007 0.007

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jwbowers_TestStatRIInterference_issues_4&d=AwMCaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=PXIzsOBFpkwyJ5C-HHz3kbeP4jHFKHPdp9LMw08MZ04&m=l0w8qap2H47KyNY-uiVwCfrXzDeHK-GwhIdzw_36uaA&s=mZ1fpK6AlMwy8e5oXs78X9_I1-xIhYPz2vM5bPX52Fw&e= .

Peter M. Aronow Assistant Professor, Departments of Political Science and Biostatistics, Yale University http://aronow.research.yale.edu

jwbowers commented 8 years ago

The results now make a bit more sense: All tests have correct coverage. SSR has more power than KS for Normal outcomes in both the additive and multiplicative effects conditions. SSR has less power than KS when the outcome is zero-inflated (mainly because the KS tests apprears to have amazing power.) I'm closing this issue and will just discuss these results without presenting them in the paper.

ksvsssrsimplepowplot