waldronlab / BugSigDB

A microbial signatures database
https://bugsigdb.org
7 stars 6 forks source link

Some terms in body site might need review #158

Closed sdgamboa closed 1 year ago

sdgamboa commented 1 year ago

While using bugsigdbr, I noticed that most terms in 'Body site' should be in lowercase. However, some terms start in uppercase. In some cases, this leads to duplications. I don't think HIV is a body site. See the code below for details.

library(bugsigdbr)
bsdb <- importBugSigDB(cache = FALSE)
body_sites <- sort(unique(bsdb$`Body site`))
terms <- c(
    ## uppercase
    'Genitals', 'Semen', 'Vaginal fluid',

    ## repetition due to uppercase
    'posterior fornix of vagina', 
    'Posterior fornix of vagina',

    ## not a body site
    'HIV'
)
body_sites[body_sites %in% terms]
#> [1] "Genitals"                   "HIV"                       
#> [3] "posterior fornix of vagina" "Posterior fornix of vagina"
#> [5] "Semen"                      "Vaginal fluid"
df <- bsdb[bsdb$`Body site` %in% terms,]
sigs <- getSignatures(df = df)
names(sigs)
#>  [1] "bsdb:270/1/1_human-papilloma-virus-infection:HPV+_vs_HPV-_UP"                                                                                        
#>  [2] "bsdb:270/1/2_human-papilloma-virus-infection:HPV+_vs_HPV-_DOWN"                                                                                      
#>  [3] "bsdb:270/2/1_human-papilloma-virus-infection:HPV+-(persistance)_vs_HPV+-(clearance)_UP"                                                              
#>  [4] "bsdb:270/2/2_human-papilloma-virus-infection:HPV+-(persistance)_vs_HPV+-(clearance)_DOWN"                                                            
#>  [5] "bsdb:280/1/1_cervical-glandular-intraepithelial-neoplasia:high-grade-squamus-intraepithelial-lesion_vs_low-grade-squamus-intraepithelial-lesion_UP"  
#>  [6] "bsdb:280/1/2_cervical-glandular-intraepithelial-neoplasia:high-grade-squamus-intraepithelial-lesion_vs_low-grade-squamus-intraepithelial-lesion_DOWN"
#>  [7] "bsdb:297/1/1_human-papilloma-virus-infection:non-pregnant-high-risk-HPV_vs_non-pregnant-no-HPV_UP"                                                   
#>  [8] "bsdb:297/1/2_human-papilloma-virus-infection:non-pregnant-high-risk-HPV_vs_non-pregnant-no-HPV_DOWN"                                                 
#>  [9] "bsdb:297/2/1_human-papilloma-virus-infection:pregnant-high-risk-HPV_vs_pregnant-no-HPV_UP"                                                           
#> [10] "bsdb:297/2/2_human-papilloma-virus-infection:pregnant-high-risk-HPV_vs_pregnant-no-HPV_DOWN"                                                         
#> [11] "bsdb:434/1/1_human-papilloma-virus-infection:HPV-+_vs_healthy-control_UP"                                                                            
#> [12] "bsdb:434/1/2_human-papilloma-virus-infection:HPV-+_vs_healthy-control_DOWN"                                                                          
#> [13] "bsdb:434/2/1_cervical-glandular-intraepithelial-neoplasia:LSIL_vs_healthy-control_UP"                                                                
#> [14] "bsdb:434/2/2_cervical-glandular-intraepithelial-neoplasia:LSIL_vs_healthy-control_DOWN"                                                              
#> [15] "bsdb:434/3/1_cervical-glandular-intraepithelial-neoplasia:HSIL_vs_healthy-control_UP"                                                                
#> [16] "bsdb:434/3/2_cervical-glandular-intraepithelial-neoplasia:HSIL_vs_healthy-control_DOWN"                                                              
#> [17] "bsdb:434/4/1_cervical-cancer:cervical-cancer_vs_healthy-control_UP"                                                                                  
#> [18] "bsdb:434/4/2_cervical-cancer:cervical-cancer_vs_healthy-control_DOWN"                                                                                
#> [19] "bsdb:436/1/1_cervical-glandular-intraepithelial-neoplasia:CIN2+/cervical-cancer_vs_healthy-control_UP"                                               
#> [20] "bsdb:522/2/1_endometriosis:Endometriosis-patients_vs_Controls-undergoing-laparoscopic-surgery-for-benign-tumors_DOWN"                                
#> [21] "bsdb:522/2/2_endometriosis:Endometriosis-patients_vs_Controls-undergoing-laparoscopic-surgery-for-benign-tumors_UP"                                  
#> [22] "bsdb:525/1/1_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_DOWN"                             
#> [23] "bsdb:525/1/2_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_UP"                               
#> [24] "bsdb:525/2/1_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_UP"                               
#> [25] "bsdb:525/2/2_endometriosis:Women-with-EM-associated-CPPS_vs_Women-without-CPPS-presenting-for-routine-examinations_DOWN"                             
#> [26] "bsdb:553/1/1_spontaneous-abortion:First-or-second-trimester-miscarriage_vs_Viable-control-pregnancy_DOWN"                                            
#> [27] "bsdb:553/2/1_spontaneous-abortion:First-trimester-miscarriage_vs_Viable-control-pregnancy_DOWN"                                                      
#> [28] "bsdb:553/2/2_spontaneous-abortion:First-trimester-miscarriage_vs_Viable-control-pregnancy_UP"                                                        
#> [29] "bsdb:553/3/1_spontaneous-abortion:complete/incomplete-miscarriage_vs_missed-miscarriage_DOWN"                                                        
#> [30] "bsdb:590/1/1_infection:HIV-1-infection_vs_HIV-1-negative_DOWN"                                                                                       
#> [31] "bsdb:590/1/2_infection:HIV-1-infection_vs_HIV-1-negative_UP"                                                                                         
#> [32] "bsdb:590/1/3_infection:HIV-1-infection_vs_HIV-1-negative_UP"                                                                                         
#> [33] "bsdb:594/1/1_periodontitis:HIV+_vs_HIV–_UP"                                                                                                          
#> [34] "bsdb:594/1/2_periodontitis:HIV+_vs_HIV–_DOWN"                                                                                                        
#> [35] "bsdb:596/1/1_HIV-infection:HIV-infected-men_vs_HIV-uninfected-men_DOWN"                                                                              
#> [36] "bsdb:596/1/2_HIV-infection:HIV-infected-men_vs_HIV-uninfected-men_UP"                                                                                
#> [37] "bsdb:598/1/1_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"                                                                                 
#> [38] "bsdb:598/1/2_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"                                                                                 
#> [39] "bsdb:598/2/1_HIV-infection:HIV-seroconverted_vs_HIV-seronegative_UP"
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2022-12-25 r83502)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-02-01
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [2] CRAN (R 4.3.0)
#>  BiocFileCache   2.7.1   2022-12-09 [1] Bioconductor
#>  bit             4.0.5   2022-11-15 [2] CRAN (R 4.3.0)
#>  bit64           4.0.5   2020-08-30 [2] CRAN (R 4.3.0)
#>  blob            1.2.3   2022-04-10 [2] CRAN (R 4.3.0)
#>  bugsigdbr     * 1.5.2   2022-11-24 [1] Bioconductor
#>  cachem          1.0.6   2021-08-19 [2] CRAN (R 4.3.0)
#>  cli             3.6.0   2023-01-09 [1] CRAN (R 4.3.0)
#>  crayon          1.5.2   2022-09-29 [2] CRAN (R 4.3.0)
#>  curl            5.0.0   2023-01-12 [2] CRAN (R 4.3.0)
#>  DBI             1.1.3   2022-06-18 [2] CRAN (R 4.3.0)
#>  dbplyr          2.3.0   2023-01-16 [2] CRAN (R 4.3.0)
#>  digest          0.6.31  2022-12-11 [2] CRAN (R 4.3.0)
#>  dplyr           1.1.0   2023-01-29 [2] CRAN (R 4.3.0)
#>  evaluate        0.20    2023-01-17 [2] CRAN (R 4.3.0)
#>  fansi           1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
#>  fastmap         1.1.0   2021-01-25 [2] CRAN (R 4.3.0)
#>  filelock        1.0.2   2018-10-05 [1] CRAN (R 4.3.0)
#>  fs              1.6.0   2023-01-23 [2] CRAN (R 4.3.0)
#>  generics        0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
#>  glue            1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
#>  htmltools       0.5.4   2022-12-07 [2] CRAN (R 4.3.0)
#>  httr            1.4.4   2022-08-17 [2] CRAN (R 4.3.0)
#>  knitr           1.42    2023-01-25 [2] CRAN (R 4.3.0)
#>  lifecycle       1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
#>  magrittr        2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
#>  memoise         2.0.1   2021-11-26 [2] CRAN (R 4.3.0)
#>  pillar          1.8.1   2022-08-19 [2] CRAN (R 4.3.0)
#>  pkgconfig       2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
#>  purrr           1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo            1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils         2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6              2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
#>  Rcpp            1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
#>  reprex          2.0.2   2022-08-17 [2] CRAN (R 4.3.0)
#>  rlang           1.0.6   2022-09-24 [2] CRAN (R 4.3.0)
#>  rmarkdown       2.20    2023-01-19 [2] CRAN (R 4.3.0)
#>  RSQLite         2.2.20  2022-12-22 [1] CRAN (R 4.3.0)
#>  rstudioapi      0.14    2022-08-22 [2] CRAN (R 4.3.0)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler          1.9.0   2023-01-15 [1] CRAN (R 4.3.0)
#>  tibble          3.1.8   2022-07-22 [2] CRAN (R 4.3.0)
#>  tidyselect      1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
#>  tzdb            0.3.0   2022-03-28 [2] CRAN (R 4.3.0)
#>  utf8            1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
#>  vctrs           0.5.2   2023-01-23 [2] CRAN (R 4.3.0)
#>  vroom           1.6.1   2023-01-22 [2] CRAN (R 4.3.0)
#>  withr           2.5.0   2022-03-03 [2] CRAN (R 4.3.0)
#>  xfun            0.37    2023-01-31 [2] CRAN (R 4.3.0)
#>  yaml            2.3.7   2023-01-23 [2] CRAN (R 4.3.0)
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-devel/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-02-01 with reprex v2.0.2

lgeistlinger commented 1 year ago

Thanks @sdgamboa.

This will be fixed in the next release of BugSigDB and is already fixed in the devel version of BugSigDB. Note that all body site terms are exported in capitalized form according to #111.

dat <- bugsigdbr::importBugSigDB(version = "devel", cache = FALSE)
terms <- c("HIV", "posterior fornix of vagina")
> terms %in% dat$Condition
[1] FALSE FALSE