waldronlab / bugphyzz

Harmonized annotation of microbial physiology
http://waldronlab.io/bugphyzz/
5 stars 5 forks source link

Values for converting confidence interval values #149

Closed sdgamboa closed 2 years ago

sdgamboa commented 2 years ago

Some values for converting confidence interval character values to numeric values (and vice-versa).

From numeric to character

Frequency Adverb
1 always
0.7 - 0.9 usually
0.4 - 0.6 sometimes
0.1 - 0.3 rarely
0 never
NA unknown

From character to numeric

Adverb Frequency
always 1
usually 0.8
sometimes 0.5
rarely 0.2
never 0
unknown NA
sdgamboa commented 2 years ago

Some numbers:

suppressMessages({
    library(bugphyzz)
    library(dplyr)
    library(purrr)

    phys <- physiologies()
    phys[['fatty acid compositions']] <- fattyAcidComposition() 
})

phys$`genome size` <- NULL
phys$`coding genes` <- NULL
phys <- discard(phys, is.null)

map(phys, ~ {
    count(.x, Evidence, Confidence_interval)
}) %>% 
    bind_rows(.id = 'dataset')
#>                                  dataset Evidence Confidence_interval      n
#> 1                      acetate producing      EXP             Unknown     24
#> 2                          aerophilicity      EXP              always   1229
#> 3                          aerophilicity      EXP           sometimes     24
#> 4                          aerophilicity  Unknown              always   6788
#> 5                        animal pathogen     <NA>             Unknown   1420
#> 6               antimicrobial resistance      ASR              Always  10321
#> 7              antimicrobial sensitivity      EXP             Usually    832
#> 8                            arrangement                      Unknown   1071
#> 9                            arrangement      EXP              always    142
#> 10                           arrangement      EXP           sometimes    610
#> 11                       biofilm forming      EXP             Unknown    426
#> 12                    butyrate producing      EXP             Usually     24
#> 13            COGEM pathogenicity rating     <NA>             Usually   1043
#> 14                   disease association      EXP             Usually    445
#> 15                   extreme environment     <NA>              Always   1875
#> 16                            gram stain      EXP              always   1337
#> 17                            gram stain      EXP           sometimes     16
#> 18                            gram stain  unknown              always      1
#> 19                            gram stain  Unknown              always   2335
#> 20                            gram stain  Unknown             unknown   1278
#> 21                         growth medium      EXP              Always    304
#> 22                    growth temperature      EXP             Unknown   2524
#> 23                               habitat      EXP           Sometimes      9
#> 24                               habitat      EXP             Unknown   5222
#> 25                               habitat      EXP             Usually  12999
#> 26                     health associated      COM             Usually     30
#> 27                hydrogen gas producing      EXP             Usually     24
#> 28                        isolation site      EXP              always   5579
#> 29                     lactate producing      EXP             Usually     24
#> 30                                length      EXP              always     43
#> 31                                length      EXP           sometimes    624
#> 32                                length      EXP             Unknown    196
#> 33 mutation rate per site per generation      EXP           Sometimes     26
#> 34      mutation rates per site per year      EXP           Sometimes     81
#> 35                            optimal ph      EXP             Usually    886
#> 36                   plant pathogenicity     <NA>             Usually   1493
#> 37                                 shape                      unknown   1279
#> 38                                 shape      EXP              always    876
#> 39                                 shape      EXP           sometimes    974
#> 40                           spore shape      EXP              always     50
#> 41                           spore shape      EXP           sometimes     97
#> 42                           spore shape      EXP             Usually   1388
#> 43                                 width                       always      1
#> 44                                 width      EXP              always    116
#> 45                                 width      EXP           sometimes    734
#> 46                                 width      EXP                <NA>     12
#> 47               fatty acid compositions      EXP             Unknown 138496

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Pop!_OS 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] purrr_0.3.4      dplyr_1.0.9      bugphyzz_0.0.1.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3      ape_5.6-2         lattice_0.20-45   tidyr_1.2.0      
#>  [5] zoo_1.8-10        assertthat_0.2.1  digest_0.6.29     foreach_1.5.2    
#>  [9] utf8_1.2.2        R6_2.5.1          plyr_1.8.7        reprex_2.0.1     
#> [13] RSQLite_2.2.14    evaluate_0.15     highr_0.9         pillar_1.7.0     
#> [17] rlang_1.0.2       curl_4.3.2        uuid_1.1-0        rstudioapi_0.13  
#> [21] data.table_1.14.2 taxize_0.9.100    blob_1.2.3        rmarkdown_2.14   
#> [25] stringr_1.4.0     bit_4.0.4         compiler_4.2.0    xfun_0.30        
#> [29] pkgconfig_2.0.3   conditionz_0.1.0  htmltools_0.5.2   tidyselect_1.1.2 
#> [33] tibble_3.1.7      httpcode_0.3.0    mgsub_1.7.3       codetools_0.2-18 
#> [37] reshape_0.8.9     fansi_1.0.3       crayon_1.5.1      hoardr_0.5.2     
#> [41] dbplyr_2.1.1      withr_2.5.0       rappdirs_0.3.3    crul_1.2.0       
#> [45] grid_4.2.0        nlme_3.1-157      jsonlite_1.8.0    lifecycle_1.0.1  
#> [49] DBI_1.1.2         magrittr_2.0.3    taxizedb_0.3.0    cli_3.3.0        
#> [53] stringi_1.7.6     cachem_1.0.6      fs_1.5.2          xml2_1.3.3       
#> [57] ellipsis_0.3.2    generics_0.1.2    vctrs_0.4.1       iterators_1.0.14 
#> [61] tools_4.2.0       bold_1.2.0        bit64_4.0.5       glue_1.6.2       
#> [65] parallel_4.2.0    fastmap_1.1.0     yaml_2.3.5        memoise_2.0.1    
#> [69] knitr_1.39

Created on 2022-05-11 by the reprex package (v2.0.1)

kbeckenrode commented 2 years ago

I think this is complete.