(Wilcoxon)-Mann-Whitney: failure as a test of medians

chantelanuit commented 5 years ago

* Enhancement: Need to present appropriate statistical information regarding the concept of "stochastic dominance" when the Mann-Whitney (and the Wilcoxon) test is used * Purpose: Correcting misconceptions about the WMW - most often, the median is not relevant. So what figure is useful when doing a Wilxocon or a WMW test ? * Use-case: **Is your feature request related to a problem? Please describe.**

"The perception that the Wilcoxon–Mann–Whitney (WMW) procedure tests equality of medians is pervasive and frequently encountered. Unfortunately, this perception is mostly wrong." (Divine et al., 2018, p. 278). Please see Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians. The American Statistician, 72(3), 278-286. doi:10.1080/00031305.2017.1305291 and also http://rfuncs.weebly.com/wmw.html

"It logically follows that the imperfect connection of the WMW test to medians will imply that use of the Hodges-Lehmann confidence interval for a difference in locations (as reflected by the medians) may also perform poorly." (Divine et al., 2018, p. 285).

"it should be noted that the Wilcoxon signed rank test has a similarly poor connection to the sample median, despite what may be asserted in textbooks" (Divine et al., 2018, p. 285).

Describe the solution you'd like

Adding bubble plots and dominance diagrams (see Divine et al., 2018, p. 285). The authors provide the SAS code for generating these figures (see supplementary materials here: https://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2017.1305291/suppl_file/utas_a_1305291_sm7602.zip). Adding the WMWodds measure: "O’Brien and Castelloe (2006) suggest that ˆp'' / (1 - ˆp'') [the “WMWodds”], is an ideal summary statistic for theWMWprocedure." (Divine, 2018, p. 280). This measure is more relevant than the Hodges-Lehmann one. Here's the SAS syntax provided by Divine et al., (2018) to compute this measure, along with the CI (extracted from https://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2017.1305291/suppl_file/utas_a_1305291_sm7602.zip): data wilcstuff; merge n1 n2 rawwilc wil_pvalue; by variable; U1 = SumOfScores - N*(N+1)/2; U2 = SumOfScores - U1; p1 = U1/(n1*n2); se_p1 = StdDevOfSum/(n1*n2); p1_lower = p1-1.96*se_p1; if -999999 1 then p1_upper=1; if p1^=1 then WMWodds = p1/(1-p1); if p1_lower^=1 then WMWodds_lower = p1_lower /(1-p1_lower ); if p1_upper^=1 then WMWodds_upper = p1_upper /(1-p1_upper ); /* If you want 1-p1, use _n_=2 in next line? */ if _n_=1; label p1="p1 = Prob(&g1 > &g2)" ; run; R work on the WMWodds is also available here: http://rfuncs.weebly.com/wmw.html **Describe alternatives you've considered** **Additional context**

EJWagenmakers commented 5 years ago

Sounds interesting, @JohnnyDoorn

tomtomme commented 9 months ago

jasp-stats / jasp-issues

(Wilcoxon)-Mann-Whitney: failure as a test of medians #453