wjawaid / enrichR

An R interface to enrichR
76 stars 17 forks source link

Erroneous gene ratios when pathways have similar names #54

Closed longjp closed 1 year ago

longjp commented 2 years ago

plotEnrich produces a bar for each unique pathway name after truncating pathway name to numChar. However since several pathways have similar names (same leading 40 characters), this can result in the bars of different pathways being added together (ggplot2 default). In an application I am working on, I had a Gene Ratio of over 1 on the plot, which is impossible. Below is a MWE with fake data. I believe enforcing unique truncated pathway names would fix the issue:

library(enrichR)
df <- data.frame(Term=c("Pathway 1 with similar name",
                        "Pathway 1 with similar name but slighly different",
                        "Pathway 3",
                        "Pathway 4"),
                 Overlap=c("10/15","15/20","1/10","3/30"),
                 P.value=c(0.0001,0.0002,0.3,0.5))

# works because all pathways have unique names when truncated to 40 characters
plotEnrich(df,showTerms=3,
           numChar = 40,y="Ratio",
           orderBy="P.value",
           title="Title")

# fails because bars from two pathways with similar names are merged together
plotEnrich(df,showTerms=3,
           numChar = 20,y="Ratio",
           orderBy="P.value",
           title="Title")

On a project with real data I got this:

Screen Shot 2022-09-21 at 11 32 31 AM

longjp commented 2 years ago

For completeness, I include plots from the MWE above and the output from running the script.

Good plot: Screen Shot 2022-09-21 at 12 01 39 PM

Bad plot:

Screen Shot 2022-09-21 at 12 01 56 PM

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(enrichR)
Welcome to enrichR
Checking connection ... 
Enrichr ... Connection is Live!
FlyEnrichr ... Connection is available!
WormEnrichr ... Connection is available!
YeastEnrichr ... Connection is available!
FishEnrichr ... Connection is available!
OxEnrichr ... Connection is available!
Warning message:
package ‘enrichR’ was built under R version 4.1.3 
> df <- data.frame(Term=c("Pathway 1 with similar name",
+                         "Pathway 1 with similar name but slighly different",
+                         "Pathway 3",
+                         "Pathway 4"),
+                  Overlap=c("10/15","15/20","1/10","3/30"),
+                  P.value=c(0.0001,0.0002,0.3,0.5))
> # works because all pathways have unique names when truncated to 40 characters
> plotEnrich(df,showTerms=3,
+            numChar = 40,y="Ratio",
+            orderBy="P.value",
+            title="Title")
> # fails because bars from two pathways with similar names are merged together
> plotEnrich(df,showTerms=3,
+            numChar = 20,y="Ratio",
+            orderBy="P.value",
+            title="Title")
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/jplong/opt/miniconda3/lib/libopenblasp-r0.3.18.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] enrichR_3.1

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1   tidyselect_1.1.1 munsell_0.5.0    colorspace_2.0-2 rjson_0.2.20     R6_2.5.1         rlang_0.4.12    
 [8] fansi_0.5.0      httr_1.4.2       dplyr_1.0.7      tools_4.1.1      grid_4.1.1       gtable_0.3.0     utf8_1.2.2      
[15] DBI_1.1.1        ellipsis_0.3.2   digest_0.6.28    assertthat_0.2.1 tibble_3.1.6     lifecycle_1.0.1  crayon_1.4.2    
[22] farver_2.1.0     purrr_0.3.4      ggplot2_3.3.5    vctrs_0.3.8      curl_4.3.2       glue_1.5.0       labeling_0.4.2  
[29] compiler_4.1.1   pillar_1.6.4     generics_0.1.1   scales_1.1.1     pkgconfig_2.0.3 
ycl6 commented 1 year ago

Hi @longjp Thanks for reporting this bug, I have submitted a PR which should fix this.

wjawaid commented 1 year ago

Thanks ycl6. I've accepted you pull request. Just waiting for CRAN to accept.