kassambara / ggpubr

'ggplot2' Based Publication Ready Plots
https://rpkgs.datanovia.com/ggpubr/
1.13k stars 165 forks source link

Brackets positioning messed up #386

Closed pdeninis closed 1 year ago

pdeninis commented 3 years ago

I made this plot some time ago.

image

As you can see, the brackets were perfectly positioned.

Today, after updating some packages - among which maybe ggpubr too - I have used the same code to redo the plot, and this is the result:

image

Have you some idea about what happened? It seems to me that the position computed for the first group has been used for the bracket of the third one, the position for the second has been used for the first group and that of the third group ended up below the second one...

Thanks.

kassambara commented 3 years ago

can you please provide a reproducible code ?

pdeninis commented 3 years ago

Before sending the code I have run it on R 4.0.2. There the function works properly.

The graph above had been run in R 3.6.1, but all the packages had been updated. Maybe that was causing the issue?

Anyway, thank you for your package, and sorry for the false alarm!

EDIT Wait! I didn't realize there is a different issue

image Now the positions are correct but the asteriscs and the "ns" on the first two brackets are swapped! The first version above was that right...

Here the code. Let me know if it works. REMOVED SINCE NOT WORKING. SEE BELOW.

pdeninis commented 3 years ago

Hi Alboukadel,

I have now installed R 4.0.5 with all new package versions. Unfortunately the result is again the preceding one, i.e. correct significance marks but wrong brackets positioning: R4 0 5

pdeninis commented 3 years ago

Hi Aboukadel, I apologize since the code I posted was incomplete: I didn't realized.

I fixed it and now it 's working.

library(rstatix)
library(ggpubr)
library(tidyverse)
library(emmeans)

test <-
structure(list(TREAT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L), .Label = c("CAF+PRF", "CAF+SCTG", "CAF"), class = "factor"),
GT_PRE = c(0.75574694999807401, 0.51377910615185396, 1.0502637416109999,
1.0529533576294099, 0.77628725655794695, 1.3024972039373,
1.29771873358056, 0.73223467651353302, 1.22548037918804,
0.70149281871692604, 1.2311467357729, 0.46592903526631801,
1.3053788021822601, 1.12747309255473, 1.0169638634654701,
1.46867879701079, 1.3887822192828201, 1.65913611414644, 0.94901409760925304,
0.68205176445635995, 0.75644708395693905, 1.3616212413887501,
0.80383327604294097, 0.83230193955718301, 0.55268821321151596,
1.0519620174778901, 0.85095516359899204, 1.4247162705459,
1.1978302782129999, 1.2008821995972401, 1.31677721147256,
1.0739940342185701, 0.56676654709428298, 0.758831881198758,
1.3437548491340201, 0.65625798126018697, 1.3642652746664301,
0.55179612606860495, 0.65226664683987401, 2.0293138768199199,
0.51765428550824999, 1.44939808663943, 1.4351805942674301,
0.79060226551773405, 1.30006971367069, 0.43285805219381801,
1.3771237596136201, 1.3424065948697199, 1.36634025195904,
0.91555239568191604, 1.3080349133981199, 0.91134124601174404,
0.95967271975461899, 0.30459336422795502, 1.2390723762771501,
1.03821439983465, 1.5266871239430899, 0.55959657184888001,
0.69628740796219102), GT_CH = c(1.5037592104009601, 0.37708778079514599,
0.45235941364484999, 1.47467219782683, 1.38937977222239,
0.38294772070726102, 0.53507913236442095, 0.51621910868277399,
1.5460292374508999, 0.57829211304672301, 0.78354210363062204,
1.27983126105837, 0.65175904047684896, 0.64853873777691096,
1.67261015217373, 0.37028803346980099, 0.80590716432980702,
0.73263456408220995, 0.70055849005411996, 1.2038454466045301,
0.85383942914639899, 1.2033768482948399, 0.78518775133726304,
1.1532594566047101, 0.95706532630658103, 1.09035994424734,
0.96203108898110901, 0.87836955861037402, 1.1076942857758001,
0.91892022371736704, 0.91944572314908601, 1.23935985446435,
0.96783791234521399, 0.864725786870432, 0.91795738647703295,
0.960009924034951, 1.1132551653164899, 1.0893876577154999,
0.81407123000060999, 0.45680795412162301, 0.36293109415511299,
0.302514131741624, 0.209648306428177, 0.33702680696005899,
0.34665042370576998, 0.24027654459536399, 0.18939783794448001,
0.29675637555508699, 0.50588365046038697, 0.38409189747421602,
0.37343044300995298, 0.215801805337029, 0.25930223824998,
0.35320250264132103, 0.25379570499944498, 0.22825343224356601,
0.31567886771034698, 0.49647580510056799, 0.172074177565914
), Subgroup = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 2L, 1L, 2L, 1L, 1L), .Label = c("LOW", "HIGH"), class = "factor")), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,
30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L,
43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L,
56L, 57L, 58L, 59L, 60L), class = "data.frame")

res.aovo<-test %>% anova_test(GT_CH ~ TREAT * Subgroup)
pwco <- test %>% group_by(TREAT) %>% emmeans_test(GT_CH ~ Subgroup, p.adjust.method = "bonferroni")
pwco <- pwco %>% add_xy_position(x = "TREAT")
bxpo <- ggpubr::ggboxplot(test, x = "TREAT", y = "GT_CH", color = "Subgroup", palette = "jco" )+ labs(y = "Gingival Thickness Increase (mm)") + labs(title = "GT Increase Factorial ANOVA Treatment * Subgroup (Baseline cutoff = 1.2)", subtitle="(without case 17)")
bxpo + stat_pvalue_manual(pwco) + labs( subtitle = get_test_label(res.aovo, detailed = TRUE), caption = get_pwc_label(pwco)) + labs(y = "Gingival Thickness Increase (mm)") + labs(title = " GT Increase ~ Treatment * Subgroup (Baseline cutoff = 1.2) w/o case `17")

Unfortunately the result is still the following:

image

pdeninis commented 3 years ago

Hi Alboukadel,

I took a look into the content of the

pwco<-test %>% group_by(TREAT) %>% emmeans_test(GT_CH ~ Subgroup, p.adjust.method = "bonferroni")

assignment.

Unexpectedly, I saw:

str(pwco)
tibble [3 x 15] (S3: rstatix_test/emmeans_test/tbl_df/tbl/data.frame)
 $ TREAT       : chr [1:3] "CAF" "CAF+PRF" "CAF+SCTG"
 $ term        : chr [1:3] "Subgroup" "Subgroup" "Subgroup"
 $ .y.         : chr [1:3] "GT_CH" "GT_CH" "GT_CH"
 $ group1      : chr [1:3] "LOW" "LOW" "LOW"
 $ group2      : chr [1:3] "HIGH" "HIGH" "HIGH"
 $ df          : num [1:3] 53 53 53
 $ statistic   : num [1:3] -0.175 3.289 -0.065
 $ p           : num [1:3] 0.86202 0.00179 0.94843
 $ p.adj       : num [1:3] 0.86202 0.00179 0.94843
 $ p.adj.signif: chr [1:3] "ns" "**" "ns"
 $ y.position  : num [1:3] 1.814 1.38 0.647
 $ groups      :List of 3
  ..$ V1: chr [1:2] "LOW" "HIGH"
  ..$ V1: chr [1:2] "LOW" "HIGH"
  ..$ V1: chr [1:2] "LOW" "HIGH"
 $ x           : num [1:3] 1 2 3
 $ xmin        : num [1:3] 2.8 0.8 1.8
 $ xmax        : num [1:3] 3.2 1.2 2.2
 - attr(*, "args")=List of 10
  ..$ data           : tibble [59 x 4] (S3: grouped_df/tbl_df/tbl/data.frame)
  .. ..$ TREAT   : Factor w/ 3 levels "CAF+PRF","CAF+SCTG",..: 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ GT_PRE  : num [1:59] 0.756 0.514 1.05 1.053 0.776 ...
  .. ..$ GT_CH   : num [1:59] 1.504 0.377 0.452 1.475 1.389 ...
  .. ..$ Subgroup: Factor w/ 2 levels "LOW","HIGH": 1 1 1 1 1 2 2 1 1 1 ...
  .. ..- attr(*, "groups")= tibble [3 x 2] (S3: tbl_df/tbl/data.frame)
  .. .. ..$ TREAT: Factor w/ 3 levels "CAF+PRF","CAF+SCTG",..: 1 2 3
  .. .. ..$ .rows: list<int> [1:3] 
  .. .. .. ..$ : int [1:19] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ : int [1:20] 20 21 22 23 24 25 26 27 28 29 ...
  .. .. .. ..$ : int [1:20] 40 41 42 43 44 45 46 47 48 49 ...
  .. .. .. ..@ ptype: int(0) 
  .. .. ..- attr(*, ".drop")= logi TRUE
  ..$ formula        :Class 'formula'  language GT_CH ~ Subgroup
  .. .. ..- attr(*, ".Environment")=<environment: 0x000000002ab63c38> 
  ..$ covariate      : NULL
  ..$ ref.group      : NULL
  ..$ comparisons    : NULL
  ..$ p.adjust.method: chr "bonferroni"
  ..$ conf.level     : num 0.95
  ..$ model          : NULL
  ..$ detailed       : logi FALSE
  ..$ method         : chr "emmeans_test"
 - attr(*, "emmeans")= tibble [6 x 8] (S3: tbl_df/tbl/data.frame)
  ..$ TREAT    : Factor w/ 3 levels "CAF+PRF","CAF+SCTG",..: 1 1 2 2 3 3
  ..$ Subgroup : Factor w/ 2 levels "LOW","HIGH": 1 2 1 2 1 2
  ..$ emmean   : num [1:6] 1.012 0.609 0.998 1.006 0.305 ...
  ..$ se       : num [1:6] 0.0743 0.0973 0.0665 0.1152 0.0814 ...
  ..$ df       : num [1:6] 53 53 53 53 53 53
  ..$ conf.low : num [1:6] 0.863 0.414 0.864 0.776 0.142 ...
  ..$ conf.high: num [1:6] 1.161 0.804 1.131 1.237 0.468 ...
  ..$ method   : chr [1:6] "Emmeans test" "Emmeans test" "Emmeans test" "Emmeans test" ...
  ..- attr(*, "estName")= chr "emmean"
  ..- attr(*, "clNames")= chr [1:2] "lower.CL" "upper.CL"
  ..- attr(*, "pri.vars")= chr [1:2] "TREAT" "Subgroup"
  ..- attr(*, "adjust")= chr "none"
  ..- attr(*, "side")= num 0
  ..- attr(*, "delta")= num 0
  ..- attr(*, "type")= chr "link"
  ..- attr(*, "mesg")= chr "Confidence level used: 0.95"

Namely, the TREAT variable first appears as a characters variable and then as a factor. I promptly checked the old output I had conserved and compared the two:

#Old output
>pwco <- pasquale17 %>% group_by(TREAT) %>% emmeans_test(GT_CH ~ Subgroup, p.adjust.method = "bonferroni")
>pwco
# A tibble: 3 x 9
  TREAT    .y.   group1 group2    df statistic       p   p.adj p.adj.signif
* <fct>    <chr> <chr>  <chr>  <dbl>     <dbl>   <dbl>   <dbl> <chr>       
1 CAF+PRF  GT_CH LOW    HIGH      53    3.29   0.00179 0.00179 **          
2 CAF+SCTG GT_CH LOW    HIGH      53   -0.0650 0.948   0.948   ns          
3 CAF      GT_CH LOW    HIGH      53   -0.175  0.862   0.862   ns   

# Current output
> pwco <- test %>% group_by(TREAT) %>% emmeans_test(GT_CH ~ Subgroup, p.adjust.method = "bonferroni")
> pwco
# A tibble: 3 x 10
  TREAT  term   .y.   group1 group2    df statistic       p   p.adj p.adj.signif
* <chr>  <chr>  <chr> <chr>  <chr>  <dbl>     <dbl>   <dbl>   <dbl> <chr>       
1 CAF    Subgr~ GT_CH LOW    HIGH      53   -0.175  0.862   0.862   ns          
2 CAF+P~ Subgr~ GT_CH LOW    HIGH      53    3.29   0.00179 0.00179 **          
3 CAF+S~ Subgr~ GT_CH LOW    HIGH      53   -0.0650 0.948   0.948   ns    

> str(test)
'data.frame':   59 obs. of  4 variables:
 $ TREAT   : Factor w/ 3 levels "CAF+PRF","CAF+SCTG",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ GT_PRE  : num  0.756 0.514 1.05 1.053 0.776 ...
 $ GT_CH   : num  1.504 0.377 0.452 1.475 1.389 ...
 $ Subgroup: Factor w/ 2 levels "LOW","HIGH": 1 1 1 1 1 2 2 1 1 1 ...

As you can see, the tibble has now 10 columns instead of 9 (there is an extra "term" column). It results even in the datanovia site, that is apparently not yet updated. In particular, in each occurence of the group_by() function, the first column, the grouping variable, is still reported as a factor.

The important difference is that the grouping column (TREAT), which was earlier reported as a <fct>, is now reported as a <chr> column, although the original TREAT variable in the data is always a factor.

Now the CAF+PRF TREAT is in the second row, so it gets the positioning of the CAF+SCTG TREAT (2nd row, according to the ordering as a factor), CAF+SCTG is in the third (and gets the positioning of CAF, which is in the 3rd row, according to the ordering as a factor) etc. This behavior perfectly matches that of the brackets.

I guess the issue might have passed unnoticed so far since in this case the factor TREAT has undergone a levels reordering, so that they do no longer match those of the character variable. I think that if using a factor whose levels have not been "reordered" the issue does not occur.

Therefore, I fear the issue was due to a change in the dplyr group_by() function. However, I believe it is worth finding a workaround to fix it in ggpubr, allowing for the use of a factor whatsoever.

The way to reorder the new character vector :

> pwco$TREAT
[1] "CAF"      "CAF+PRF"  "CAF+SCTG"

to match the factor vector, could be to add:

pwco$TREAT[order(order(levels(TREAT)))]
[1] "CAF+PRF"  "CAF+SCTG" "CAF" 

Unfortunately, while:

pwco[order(order(levels(TREAT))),]
# A tibble: 3 x 10
  TREAT  term   .y.   group1 group2    df statistic       p   p.adj p.adj.signif
  <chr>  <chr>  <chr> <chr>  <chr>  <dbl>     <dbl>   <dbl>   <dbl> <chr>       
1 CAF+P~ Subgr~ GT_CH LOW    HIGH      53    3.29   0.00179 0.00179 **          
2 CAF+S~ Subgr~ GT_CH LOW    HIGH      53   -0.0650 0.948   0.948   ns          
3 CAF    Subgr~ GT_CH LOW    HIGH      53   -0.175  0.862   0.862   ns    

works as intended, adding the code to the stat_pvalue_manual() function is not sufficient to fix it, but results in an error:

bxpo + stat_pvalue_manual(pwco[order(order(levels(TREAT))),]) + labs( subtitle = get_test_label(res.aovo, detailed = TRUE), caption = get_pwc_label(pwco)) + labs(y = "Gingival Thickness Increase (mm)") + labs(title = " GT Increase ~ Treatment * Subgroup (Baseline cutoff = 1.2) w/o case 17")
Error in .valide_y_position(y.position, data) : 
  can't find the y.position variable 'y.position' in the data

Therefore I think either a way to reorder the new output from group_by() according to that of the factor variable (in the plot) should be found or a nominal addressing to the groups should be used.

Another solution may be to warn the dplyr developers of the problem.

kassambara commented 1 year ago

The issue is caused by broom:tidy() at https://github.com/kassambara/rstatix/blob/c36a70f22a8aeac58ffc9190decd70db18d0ebd1/R/emmeans_test.R#L135, which is changing factor to character since version broom version 0.5.6.

pdeninis commented 1 year ago

Well done, the culprit was found. Have you thought of a workaround? Might a fast solution (surely not optimal, but useful in the meantime you release the fix) be to include in your package the old version of the function changed after having renamed it?

kassambara commented 1 year ago

fixed now, thanks for reporting the issue and for your reproducible scripts. Main changes done at: https://github.com/kassambara/rstatix/pull/170

You need the latest dev versions of ggpubr and rstatix

pdeninis commented 1 year ago

Hi Alboukadel, I have installed R 4.2.2 with ggpubr and rstatix. It doesn't work.

I checked the version of the installed packages and got:

> packageVersion("ggpubr")
[1] ‘0.5.0’
> packageVersion("rstatix")
[1] ‘0.7.1’

I read on your link that rstatix 0.7.1.999 is needed, but I can't find it. What exactly should I do?

kassambara commented 1 year ago

Please consider installing the latest dev version of the two packages as follow:

devtools::install_github("kassambara/rstatix")
devtools::install_github("kassambara/ggpubr")

After installation, everythyng should work. Please let me know if this is not case.

pdeninis commented 1 year ago

Thanks! Now it is OK.