kassambara / ggpubr

'ggplot2' Based Publication Ready Plots
https://rpkgs.datanovia.com/ggpubr/
1.12k stars 165 forks source link

Kruskal.test vs wilcox.test in compare_means #547

Open pdhrati02 opened 1 year ago

pdhrati02 commented 1 year ago

Hi, in my dataset I have 3 groups and 10 classes for which I wish to compare the abundance per class between groups. An example is of the data will look like this: Group class abundance control a 2 2 a 5 3 c 6 control c 0 2 a 3 3 b 6 control c 9 2 b 3 3 b 6 control c 9 2 a 2 3 c 7

I did read online that it is possible to perform kruskal.test for multiple comparisons, but I am unable to use it.

stat_tp1 <- compare_means(data = col_tp1, abundance_log ~ Group, ref.group = "control", group.by = "class") This works fine as the default test used is wilcoxon: results being comparing 1 to control and 2 to control for each class mentioned.

However, when I try the same with kruskal.test I get an error: Error in dplyr::filter(): ℹ In argument: group1 == ref.group | group2 == ref.group. Caused by error: ! object 'group2' not found

I also tried using dunn_test from rstatix package, however there is no grouping option I can use there and piping to supply group_by isn't helping.

I am able to plot boxplots using the following, however this is not what I want. I want those individual p-values to make a different plot. ggboxplot(col_tp1, x = "Group", y = "abundance_log") + facet_wrap(~class) + geom_pwc(method = "dunn_test")

Any help would be appreciated. Thank you

ADelCortona commented 1 year ago

I am having similar issues, but using "anova", rather than kruskal.test.

Same data format as pdhrati02

data example: my_data.txt

ID  Pop HET_CT
A1  A   8072
A2  A   6176
A3  A   6247
B1  B   6322
B2  B   7334
B3  B   15101
C1  C   12172
C2  C   11683
C3  C   12599

code snippet

# import dataset
my_data = read.delim("my_data.txt",  header = TRUE)

# sort  Populations
my_data$Pop = factor(
  my_data$Pop,
  levels = c("A", "B", "C")
)

# compare means
tests = ggpubr::compare_means(HET_CT ~ Pop, data = my_data, ref.group = "A",
                              method = "anova")

error:

> tests = ggpubr::compare_means(HET_CT ~ Pop, data = my_data, ref.group = "A",
+                               method = "anova")
Error in `dplyr::filter()`:
i In argument: `group1 == ref.group | group2 == ref.group`.
Caused by error:
! object 'group2' not found
Run `rlang::last_trace()` to see where the error occurred.

rlang::last_trace(drop = FALSE) output:

<error/rlang_error>
Error in `dplyr::filter()`:
i In argument: `group1 == ref.group | group2 == ref.group`.
Caused by error:
! object 'group2' not found
---
Backtrace:
     x
  1. +-ggpubr::compare_means(...)
  2. | \-res %>% ...
  3. +-dplyr::filter(., group1 == ref.group | group2 == ref.group)
  4. +-dplyr:::filter.data.frame(., group1 == ref.group | group2 == ref.group)
  5. | \-dplyr:::filter_rows(.data, dots, by)
  6. |   \-dplyr:::filter_eval(...)
  7. |     +-base::withCallingHandlers(...)
  8. |     \-mask$eval_all_filter(dots, env_filter)
  9. |       \-dplyr (local) eval()
 10. \-base::.handleSimpleError(...)
 11.   \-dplyr (local) h(simpleError(msg, call))
 12.     \-rlang::abort(message, class = error_class, parent = parent, call = error_call)

The code worked as intended few weeks ago, but since I am handling several project I cannot pinpoint exactly in time what changed and when it stopped working.

I have updated R and all the packages to the latest version in the attempt to fix it (tried as well the github-devel version of ggpubr), but the error stays.

Many thanks in advance for the help!

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed

Matrix products: default
BLAS:   /usr/lib64/R/lib/libRblas.so 
LAPACK: /usr/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
[1] C

time zone: Europe/Brussels
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-3 plyr_1.8.8         gridExtra_2.3      ggpubr_0.6.0.999   ggplot2_3.4.2     

loaded via a namespace (and not attached):
 [1] gtable_0.3.3         dplyr_1.1.2          compiler_4.3.0       ggsignif_0.6.4       tidyselect_1.2.0     Rcpp_1.0.10         
 [7] tidyr_1.3.0          scales_1.2.1         ggdist_3.2.1         R6_2.5.1             generics_0.1.3       distributional_0.3.2
[13] backports_1.4.1      tibble_3.2.1         car_3.1-2            munsell_0.5.0        pillar_1.9.0         rlang_1.1.1         
[19] utf8_1.2.3           broom_1.0.4          cli_3.6.1            withr_2.5.0          magrittr_2.0.3       rstudioapi_0.14     
[25] lifecycle_1.0.3      vctrs_0.6.2          rstatix_0.7.2        glue_1.6.2           farver_2.1.1         abind_1.4-5         
[31] carData_3.0-5        fansi_1.0.4          colorspace_2.1-0     purrr_1.0.1          tools_4.3.0          pkgconfig_2.0.3    

EDIT: reformatted code snippet to something readable :)