kassambara / rstatix

Pipe-friendly Framework for Basic Statistical Tests in R
https://rpkgs.datanovia.com/rstatix/
444 stars 50 forks source link

Rounding error when using small numbers with get_summary_stats #145

Open alapo opened 2 years ago

alapo commented 2 years ago

I have been successfully using rstatix to create demographics tables for some time now. Today I ran into an issue when using small numbers as inputs. The function get_summary_stats will round the output and give all zeros. Below is a reprex, is there a way to force the function not to round the output?

# reprex example

df <- data.frame(id = c(1,2,3,1,2,3),
                 value = c(-8.107449e-08,-6.614925e-08,-5.412777e-08,-4.470794e-08,-3.159405e-08,-1.922514e-08))

df %>% 
  rstatix::get_summary_stats(value, type = "mean_sd")

img

m-wegert commented 2 years ago

Hi. I had a similar problem using the function rstatix::cor_test to correlate gene expression of one gene of interest with gene expression of all other genes of an expression matrix. However some p-values are round to 0 . Is there an option to retain the original very small p-value? Thanks for your reply! Best regards

# example:

library(ALL)
data(ALL)
exp.all <- exprs(ALL) %>% t() %>% as.data.frame()

# which genes correlate significantly with one gene of interest?
library(rstatix)
cortestx <- exp.all %>% 
  cor_test(
    vars = colnames(exp.all)[1],
    vars2 = colnames(exp.all)[2:ncol(exp.all)], 
    method = "spearman") %>%  
  adjust_pvalue(., method = "BH") %>%
  add_significance()

# one bug of cor_test: some p-values are exactly 0
cortestx <- cortestx[order(cortestx$p.adj, decreasing = F),]
cortestx[1:5,]

image

caparks2 commented 2 years ago

I also have this problem. The rstatix::add_xy_position() function, which can be used to automatically position p values on a ggplot when used with ggpubr::stat_pvalue_manual(), depends on rstatix::get_summary_stats(). When the y values are small, incorrect rounding results in positioning the p value at y = 0. This problem was posted on stackoverflow, where the advice from the community there was to work around the issue by multiplying the y values by a large value before using rstatix, and then dividing them to convert them back to their original scale.

Here is the MRE posted at stackoverflow to illustrate the problem here.

# import libraries
library(tidyverse)
library(rstatix)
library(ggpubr)

# minimal reproducible data example
data <- tibble::tribble(
  ~Group,       ~Response,
  "Treatment",  3.210486e-06,
  "Control",    4.006825e-06,
  "Treatment",  4.350836e-06,
  "Control",    4.216934e-06,
  "Treatment",  4.415194e-06,
  "Control",    1.0260606e-05,
  "Treatment",  1.111064e-06,
  "Control",    1.0779088e-05,
  "Treatment",  3.57185e-07,
  "Control",    1.139097e-06,
  "Treatment",  0,
  "Control",    2.31074e-07,
  "Treatment",  5.78956e-07,
  "Control",    4.371157e-06,
  "Treatment",  6.5825e-08,
  "Control",    9.587202e-06,
  "Treatment",  2.65383e-07,
  "Control",    3.57337e-06,
  "Treatment",  7.14146e-07,
  "Control",    3.868605e-06,
  "Treatment",  1.2213951e-05,
  "Control",    6.936899e-06,
  "Treatment",  4.71707e-07,
  "Control",    5.5957173e-05,
  "Treatment",  1.265942e-06,
  "Control",    8.5563441e-05,
  "Treatment",  0,
  "Control",    0,
  "Treatment",  0,
  "Control",    2.1306289e-05,
  "Treatment",  5.2055e-07,
  "Control",    1.8420094e-05
)

# performing Wilcoxon test and adding 'y.position' for the p value
wilcoxon_result <- data %>%
  wilcox_test(Response ~ Group) %>%
  add_xy_position("Group")

# the 'y.position' is incorrectly defined as zero as a result of rounding error.
wilcoxon_result
#> # A tibble: 1 × 11
#>   .y.      group1  group2     n1    n2 statistic       p y.position groups  xmin
#>   <chr>    <chr>   <chr>   <int> <int>     <dbl>   <dbl>      <dbl> <name> <dbl>
#> 1 Response Control Treatm…    16    16      206. 0.00326          0 <chr>      1
#> # … with 1 more variable: xmax <dbl>

# the p value and bracket geom are plotted at zero due to the rounding issue.
ggplot(data, aes(Group, Response)) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_boot, width = 0.25) +
  stat_summary(geom = "crossbar", fun = mean, fun.min = mean, fun.max = mean, fatten = FALSE, width = 0.5) +
  stat_pvalue_manual(wilcoxon_result, "P = {p}", size = 18 / ggplot2::.pt, tip.length = 0, bracket.size = 2 / ggplot2::.stroke) +
  theme_pubr(base_size = 18, x.text.angle = 45)