Tell stat_compare_means() the column to use for pairing

Hi,

first of all I want to say that I love your packages. Thanks for your work!

I'm preparing plots to compare intron retention levels of paired normal and tumor patient samples (1 normal and 1 tumor sample per patient).

Regarding my issue. I prepare plots that are combination of geom_boxplot(), geom_point() and geom_path(), so in the end what ggpaired() does. I add the P value of a paired Wilcoxon test using stat_compare_means(paired=T). However, I ran into the problem that the reported P value is different than the P value I get when I use wilcox.test(). The problem was that the data.frame I used for plotting was not sorted by the ID of the patients. Therefore, the samples wer incorrectly paired by stat_compare_means(). I solved the problem by sorting the data.frame using dplyr::arrange() based on the ID of the patient before preparing the plots.

My question is, whether it would make sense to add a parameter to stat_compare_means() that indicates the column that should be used for pairing (in my case the column containing the ID of the patients). Or at least warn the user that the data.frame should be sorted.

Here is a small example where I create an unsorted dummy data.frame and a sorted data.frame, which are used within ggpaired(). As one can see the data.frame needs to be sorted to have the correct pairing and the correct P value.

library(ggpubr)
library(dplyr)
set.seed(123)
unsorted <- data.frame(sampleType = c(rep("Normal",10), rep("Tumor", 10)),
           value = c(runif(10,0,0.2), runif(10,0, 0.4)),
           ID = c(1:10, 10:1))

sorted <- unsorted %>% arrange(sampleType,ID)

ggpaired(unsorted, x = "sampleType", y = "value",
   color = "sampleType", line.color = "gray", line.size = 0.4,
   palette = "npg")+
 stat_compare_means(paired = TRUE)

ggpaired(sorted, x = "sampleType", y = "value",
   color = "sampleType", line.color = "gray", line.size = 0.4,
   palette = "npg")+
 stat_compare_means(paired = TRUE)

Best, Mario

kassambara / ggpubr

Tell stat_compare_means() the column to use for pairing #560