markmfredrickson / RItools

Randomization inference tools for R
GNU General Public License v2.0
16 stars 11 forks source link

multiplicity adjusted univariate p-values via `p.adjust` #34

Open benthestatistician opened 9 years ago

benthestatistician commented 9 years ago

The univariate p-values that we are currently providing are not corrected for multiplicity, something we should correct. Let's do this by passing the current univariate p-values through stats:p.adjust:

benthestatistician commented 9 years ago

Separable from but related to broader proposal described/discussed in this wiki

benthestatistician commented 8 years ago

Noting a tentative resolution (from offline discussion with @markmfredrickson) to fold this in with current developments on the clusters branch.

benthestatistician commented 6 years ago

It's a little odd that balT()'s reports both an overall p-value and univariate p-values, without coordination between the two. Jotting down notes on 2 potential ways the one could inform the other.

  1. The overall test might act as a gateway for univariate tests
  2. When the overall test is requested, don't report univariate p-values for original variables; provide some other sort of follow-up detail

Re 1, I have a conjecture: The multivariate test, giving p-value q0, can be combined with univariate z-tests giving p-values q1, ..., qk to give a Holm-like, FWER-controlling, step-down procedure furnishing p-values p1, ..., pk that are uniformly no larger than p.adjust(c(q1, <...>, qk), method="holm").

If the conjecture turns correct, we might consider offering that procedure as an alternative to the other p.adjust options. Of course verifying the conjecture may also suggest elaborations giving still better power.

Re 2, one might follow up on a significant global test by reporting the significance of balance tests performed on principal components of the x-matrix, rather than on the x-es themselves. These correspond more closely to the global test statistic, so this approach could facilitate coherence among the tests.

61 calls for giving the user a means of providing or designating a dispersion matrix for the X's, primarily with descriptive calculations in mind. If we use this same covariance matrix for the Mahalanobis-type test statistic that the multivariate test is to be based on, it will determine the relevant principal components. So the principal components will have been defined relative to a covariance matrix that the user is in a position to interpret, not the more obscure covariance-of-univariate-imbalances that will have to be computed under the hood. It seems a safe bet that anyone we might hope to interest in these calculations will have at least a passing familiarity with principal components; there might also be something to be said for scaring others away. (OTOH figuring out how to report all this back to the user will create more coding work.)