kassambara / rstatix

Pipe-friendly Framework for Basic Statistical Tests in R
https://rpkgs.datanovia.com/rstatix/
440 stars 50 forks source link

wilcox_test error (cause only one value in my test) #128

Open jsaintvanne opened 3 years ago

jsaintvanne commented 3 years ago

Hi,

I run a script on data wich are classify with material and group by cluster :

data material cluster
1  -3.00000000       Mo       2
2  -3.00000000       Sg       1
3  -3.00000000       Mo       2
4  -3.00000000       Sg       3
5  -3.00000000       Mo       2
6  -3.00000000       Sg       3
7  -3.00000000       Mo       3
8  -3.00000000       Sg       3
9  -0.18015529       Mo       2
10 -3.00000000       Mo       2
11 -3.00000000       Sg       2
12 -3.00000000       Mo       2
13 -3.00000000       Mo       1
14 -3.00000000       Sg       1
15 -3.00000000       Sg       1
16 -0.27042093       Mo       4
17 -3.00000000       Sg       1
18 -3.00000000       Mo       2
19 -3.00000000       Sg       1
20 -3.00000000       Mo       3
21 -3.00000000       Sg       2
22 -3.00000000       Mo       4
23 -3.00000000       Sg       3
24 -3.00000000       Mo       2
25 -3.00000000       Sg       3
26 -3.00000000       Sg       3
27 -3.00000000       Sg       3
28 -0.59120296       Mo       4
29 -3.00000000       Sg       2

I made this thing :

subdata %>% 
group_by(cluster) %>% 
wilcox_test(data~material, p.adjust.method = "fdr")

and I obtain an error on my third cluster because I have only one value in this cluster (on different rows but always the same value).

Erreur : Problem with `mutate()` column `data`.
ℹ `data = map(.data$data, .f, ...)`.
✖ valeur manquante là où TRUE / FALSE est requis
Run `rlang::last_error()` to see where the error occurred.

I absolutely need to pass this thing because I run it in a for loop and some clusters have the same values sometimes... So have you a solution for this pliz ?

Thnaks for your help

PS : I made something to exclude this cluster, make a "ghost line" then add it to the stat result but the add_xy_position that I run after gives me an NA...

benediktclaus commented 2 years ago

Hi @jsaintvanne

This does not seem to be an issue of rstatix. You can remove clusters with identical outcome values by filtering clusters with 0 variance out (see below), but that introduces another error (too few observations). I don't know if tests of statistical significance are really useful in this case; maybe an exploratory data analysis can serve you better.

library(tidyverse)
library(rstatix)

reprex_data <- tribble(
  ~ data, ~ material, ~ cluster,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       1,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       3,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       3,
  -3.00000000,       "Mo",       3,
  -3.00000000,       "Sg",       3,
  -0.18015529,       "Mo",       2,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       2,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Mo",       1,
  -3.00000000,       "Sg",       1,
  -3.00000000,       "Sg",       1,
  -0.27042093,       "Mo",       4,
  -3.00000000,       "Sg",       1,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       1,
  -3.00000000,       "Mo",       3,
  -3.00000000,       "Sg",       2,
  -3.00000000,       "Mo",       4,
  -3.00000000,       "Sg",       3,
  -3.00000000,       "Mo",       2,
  -3.00000000,       "Sg",       3,
  -3.00000000,       "Sg",       3,
  -3.00000000,       "Sg",       3,
  -0.59120296,       "Mo",       4,
  -3.00000000,       "Sg",       2
)

reprex_data %>% 
  group_by(cluster) %>% 
  filter(var(data) != 0) %>% 
  wilcox_test(data ~ material, p.adjust.method = "fdr")
#> Error: Problem with `mutate()` column `data`.
#> i `data = map(.data$data, .f, ...)`.
#> x not enough 'y' observations

Created on 2021-10-06 by the reprex package (v2.0.1)