waldronlab / bugphyzz

Harmonized annotation of microbial physiology
http://waldronlab.io/bugphyzz/
5 stars 5 forks source link

Review Frequency and Evidence columns and add missing full references #205

Closed sdgamboa closed 9 months ago

sdgamboa commented 1 year ago

@kbeckenrode, would you help review that the source, evidence, and frequency are all correct? I see that the microbe directory sometimes has 'unknown' or 'exp' in the Evidence column and 'usually', 'unknown', and 'always' in the Frequency column. Is this correct? Some cells are empty and a similar pattern is found for other attribute sources.

Do we need a unit test for this? If so, how should it look like?

Also, would you add the full reference of HeaverS_2018 and OlsenI_2001 in the 'full_source' column of this file: https://github.com/waldronlab/bugphyzz/blob/main/inst/extdata/attribute_sources.tsv?

I think the table is a little hard to see here because I added the full source. Probably need to run the code I pasted here on your machine and visualize the table with View().

suppressMessages({
  library(bugphyzz)
  library(dplyr)
  library(purrr)
  phys <- physiologies(keyword = 'all')
})

df <- phys |> 
  map(~ count(.x, Attribute_source, Evidence, Frequency)) |> 
  bind_rows(.id = 'spreadsheet') |> 
  arrange(Attribute_source, Evidence, Frequency)
df
#>                              spreadsheet
#> 1                      health associated
#> 2                      acetate producing
#> 3                     butyrate producing
#> 4                 hydrogen gas producing
#> 5                      lactate producing
#> 6                                habitat
#> 7                            spore shape
#> 8                                habitat
#> 9  mutation rate per site per generation
#> 10      mutation rates per site per year
#> 11                sphingolipid producing
#> 12                               habitat
#> 13                           genome size
#> 14                          coding genes
#> 15                sphingolipid producing
#> 16                sphingolipid producing
#> 17              antimicrobial resistance
#> 18                         aerophilicity
#> 19                           arrangement
#> 20                            gram stain
#> 21                               habitat
#> 22                                 shape
#> 23                                 shape
#> 24                        isolation site
#> 25                    growth temperature
#> 26                               habitat
#> 27                   disease association
#> 28                         aerophilicity
#> 29                           arrangement
#> 30                           arrangement
#> 31                            gram stain
#> 32                                 shape
#> 33                        isolation site
#> 34                         aerophilicity
#> 35                         growth medium
#> 36                    growth temperature
#> 37                       biofilm forming
#> 38                            gram stain
#> 39                    growth temperature
#> 40                               habitat
#> 41             antimicrobial sensitivity
#> 42                            optimal ph
#> 43                   extreme environment
#> 44                       animal pathogen
#> 45                            gram stain
#> 46            COGEM pathogenicity rating
#> 47                   plant pathogenicity
#> 48                         aerophilicity
#> 49                           arrangement
#> 50                            gram stain
#> 51                                length
#> 52                                 shape
#> 53                           spore shape
#> 54                                 width
#> 55                sphingolipid producing
#> 56                         aerophilicity
#> 57                           arrangement
#> 58                            gram stain
#> 59                                length
#> 60                                 shape
#> 61                           spore shape
#> 62                                 width
#> 63                            gram stain
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Attribute_source
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Asnicar F, Berry SE, Valdes AM, et al. Microbiome connections with host metabolism and habitual diet from 1,098 deeply phenotyped individuals. Nat Med. 2021;27(2):321-332. doi:10.1038/s41591-020-01183-8
#> 2                                                                                                                                                                                                                                                                                                                                                                                                           Barcenilla A, Pryde SE, Martin JC, Duncan SH, Stewart CS, Henderson C, Flint HJ. Phylogenetic relationships of butyrate-producing bacteria from the human gut. Appl Environ Microbiol. 2000 Apr;66(4):1654-61. doi: 10.1128/aem.66.4.1654-1661.2000. PMID: 10742256; PMCID: PMC92037.
#> 3                                                                                                                                                                                                                                                                                                                                                                                                           Barcenilla A, Pryde SE, Martin JC, Duncan SH, Stewart CS, Henderson C, Flint HJ. Phylogenetic relationships of butyrate-producing bacteria from the human gut. Appl Environ Microbiol. 2000 Apr;66(4):1654-61. doi: 10.1128/aem.66.4.1654-1661.2000. PMID: 10742256; PMCID: PMC92037.
#> 4                                                                                                                                                                                                                                                                                                                                                                                                           Barcenilla A, Pryde SE, Martin JC, Duncan SH, Stewart CS, Henderson C, Flint HJ. Phylogenetic relationships of butyrate-producing bacteria from the human gut. Appl Environ Microbiol. 2000 Apr;66(4):1654-61. doi: 10.1128/aem.66.4.1654-1661.2000. PMID: 10742256; PMCID: PMC92037.
#> 5                                                                                                                                                                                                                                                                                                                                                                                                           Barcenilla A, Pryde SE, Martin JC, Duncan SH, Stewart CS, Henderson C, Flint HJ. Phylogenetic relationships of butyrate-producing bacteria from the human gut. Appl Environ Microbiol. 2000 Apr;66(4):1654-61. doi: 10.1128/aem.66.4.1654-1661.2000. PMID: 10742256; PMCID: PMC92037.
#> 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Browne, H.P., Almeida, A., Kumar, N. et al. Host adaptation in gut Firmicutes is associated with sporulation loss and altered transmission cycle. Genome Biol 22, 204 (2021). https://doi.org/10.1186/s13059-021-02428-6
#> 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Browne, H.P., Almeida, A., Kumar, N. et al. Host adaptation in gut Firmicutes is associated with sporulation loss and altered transmission cycle. Genome Biol 22, 204 (2021). https://doi.org/10.1186/s13059-021-02428-6
#> 8  Dueholm, M.S., Nierychlo, M., Andersen, K.S., Rudkjøbing, V., Knudsen, S., the MiDAS Global Consortium, Albertsen, M., Nielsen, P.H. 2021; MiDAS 4: A global catalogue of full-length 16S rRNA gene sequences and taxonomy for studies of bacterial communities in wastewater treatment plants. BioRxiv. Nierychlo, M., Andersen, K.S., Xu, Y., Green, N., Jiang, C., Albertsen, M., Dueholm, M.S., Nielsen, P.H., 2020. MiDAS 3: An ecosystem-specific reference database, taxonomy and knowledge platform for activated sludge and anaerobic digesters reveals species-level microbiome composition of activated sludge. Water Research 115955. https://doi.org/10.1016/j.watres.2020.115955
#> 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Gibson B, Wilson DJ, Feil E, Eyre-Walker A. The distribution of bacterial doubling times in the wild. Proc Biol Sci. 2018 Jun 13;285(1880):20180789. doi: 10.1098/rspb.2018.0789. PMID: 29899074; PMCID: PMC6015860.
#> 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Gibson B, Wilson DJ, Feil E, Eyre-Walker A. The distribution of bacterial doubling times in the wild. Proc Biol Sci. 2018 Jun 13;285(1880):20180789. doi: 10.1098/rspb.2018.0789. PMID: 29899074; PMCID: PMC6015860.
#> 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   HeaverS_2018
#> 12                                                                                                                                                                                                                                                                                                                                                                              Hilt, E. E., McKinley, K., Pearce, M. M., Rosenfeld, A. B., Zilliox, M. J., Mueller, E. R., ... & Schreckenberger, P. C. (2014). Urine is not sterile: use of enhanced urine culture techniques to detect resident bacterial flora in the adult female bladder. Journal of clinical microbiology, 52(3), 871-876.
#> 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Kegg
#> 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Kegg
#> 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    OlsenI_2001
#> 16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    OlsenI_2001
#> 17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         PATRIC
#> 18                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ProTraits
#> 19                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ProTraits
#> 20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ProTraits
#> 21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ProTraits
#> 22                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ProTraits
#> 23                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 24                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 26                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 27                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 28                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   http://bacmap.wishartlab.com
#> 33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/tree/master/output/prepared_data
#> 34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/tree/master/output/prepared_data
#> 35                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://github.com/bovee/fattyacids
#> 36                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            https://github.com/bovee/fattyacids
#> 37                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 38                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 39                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 40                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 41                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 42                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 43                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 44                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 46                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 47                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://github.com/dcdanko/MD2
#> 48                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 49                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 50                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 51                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 53                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 54                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 55                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 56                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 57                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 58                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 59                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 60                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 61                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 62                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#> 63                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.bergeys.org/
#>    Evidence Frequency     n
#> 1       igc   usually    30
#> 2       exp   usually    24
#> 3       exp   usually    24
#> 4       exp   usually    14
#> 5       exp   usually    15
#> 6       exp   unknown  1429
#> 7       exp   usually  1388
#> 8       exp   usually 12999
#> 9       exp sometimes    26
#> 10      exp sometimes    81
#> 11      exp    always     4
#> 12      exp sometimes     9
#> 13      igc   usually  4665
#> 14      igc   usually  4669
#> 15      exp    always     8
#> 16      exp sometimes     1
#> 17      igc    always 10311
#> 18      igc    always  5076
#> 19      igc    always  5280
#> 20      igc    always  3058
#> 21      igc    always 64153
#> 22      igc    always 12615
#> 23            unknown     6
#> 24      exp    always    99
#> 25      exp   unknown   535
#> 26      exp   unknown  1438
#> 27      exp   usually   445
#> 28  unknown    always  1304
#> 29  unknown sometimes    24
#> 30  unknown   unknown  1047
#> 31  unknown   unknown  1278
#> 32  unknown   unknown  1273
#> 33      exp    always  5316
#> 34  unknown    always  3837
#> 35      exp    always   254
#> 36      exp   unknown   641
#> 37      exp   unknown   426
#> 38      exp   unknown    15
#> 39      exp   unknown  1347
#> 40      exp   unknown  2354
#> 41      exp   usually   832
#> 42      exp   usually   886
#> 43  unknown    always  1874
#> 44  unknown   unknown  1416
#> 45  unknown   unknown  2337
#> 46  unknown   usually  1042
#> 47  unknown   usually  1493
#> 48      exp    always  1229
#> 49      exp    always   142
#> 50      exp    always  1315
#> 51      exp    always    43
#> 52      exp    always   876
#> 53      exp    always    50
#> 54      exp    always   117
#> 55      exp    always     7
#> 56      exp sometimes    24
#> 57      exp sometimes   610
#> 58      exp sometimes    16
#> 59      exp sometimes   620
#> 60      exp sometimes   948
#> 61      exp sometimes    97
#> 62      exp sometimes   734
#> 63  unknown    always     1
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2022-12-25 r83502)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-02-02
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  ape           5.6-2   2022-03-02 [1] CRAN (R 4.3.0)
#>  assertthat    0.2.1   2019-03-21 [2] CRAN (R 4.3.0)
#>  bit           4.0.5   2022-11-15 [2] CRAN (R 4.3.0)
#>  bit64         4.0.5   2020-08-30 [2] CRAN (R 4.3.0)
#>  blob          1.2.3   2022-04-10 [2] CRAN (R 4.3.0)
#>  bold          1.2.0   2021-05-11 [1] CRAN (R 4.3.0)
#>  bugphyzz    * 0.0.1.3 2023-02-02 [1] local
#>  cachem        1.0.6   2021-08-19 [2] CRAN (R 4.3.0)
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.3.0)
#>  codetools     0.2-18  2020-11-04 [2] CRAN (R 4.3.0)
#>  conditionz    0.1.0   2019-04-24 [1] CRAN (R 4.3.0)
#>  crayon        1.5.2   2022-09-29 [2] CRAN (R 4.3.0)
#>  crul          1.3     2022-09-03 [1] CRAN (R 4.3.0)
#>  curl          5.0.0   2023-01-12 [2] CRAN (R 4.3.0)
#>  data.table    1.14.6  2022-11-16 [2] CRAN (R 4.3.0)
#>  DBI           1.1.3   2022-06-18 [2] CRAN (R 4.3.0)
#>  dbplyr        2.3.0   2023-01-16 [2] CRAN (R 4.3.0)
#>  digest        0.6.31  2022-12-11 [2] CRAN (R 4.3.0)
#>  dplyr       * 1.1.0   2023-01-29 [2] CRAN (R 4.3.0)
#>  ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.3.0)
#>  evaluate      0.20    2023-01-17 [2] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.3.0)
#>  foreach       1.5.2   2022-02-02 [1] CRAN (R 4.3.0)
#>  fs            1.6.0   2023-01-23 [2] CRAN (R 4.3.0)
#>  generics      0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
#>  hms           1.1.2   2022-08-19 [2] CRAN (R 4.3.0)
#>  hoardr        0.5.3   2023-01-26 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.4   2022-12-07 [2] CRAN (R 4.3.0)
#>  httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.3.0)
#>  iterators     1.0.14  2022-02-05 [1] CRAN (R 4.3.0)
#>  jsonlite      1.8.4   2022-12-06 [2] CRAN (R 4.3.0)
#>  knitr         1.42    2023-01-25 [2] CRAN (R 4.3.0)
#>  lattice       0.20-45 2021-09-22 [2] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
#>  memoise       2.0.1   2021-11-26 [2] CRAN (R 4.3.0)
#>  mgsub         1.7.3   2021-07-28 [1] CRAN (R 4.3.0)
#>  nlme          3.1-162 2023-01-31 [2] CRAN (R 4.3.0)
#>  pillar        1.8.1   2022-08-19 [2] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
#>  plyr          1.8.8   2022-11-11 [1] CRAN (R 4.3.0)
#>  purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
#>  rappdirs      0.3.3   2021-01-31 [2] CRAN (R 4.3.0)
#>  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
#>  readr         2.1.3   2022-10-01 [2] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [2] CRAN (R 4.3.0)
#>  reshape       0.8.9   2022-04-12 [1] CRAN (R 4.3.0)
#>  rlang         1.0.6   2022-09-24 [2] CRAN (R 4.3.0)
#>  rmarkdown     2.20    2023-01-19 [2] CRAN (R 4.3.0)
#>  RSQLite       2.2.20  2022-12-22 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.14    2022-08-22 [2] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  stringi       1.7.12  2023-01-11 [2] CRAN (R 4.3.0)
#>  stringr       1.5.0   2022-12-02 [2] CRAN (R 4.3.0)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.3.0)
#>  taxize        0.9.100 2022-04-22 [1] CRAN (R 4.3.0)
#>  taxizedb      0.3.0   2021-01-15 [1] CRAN (R 4.3.0)
#>  tibble        3.1.8   2022-07-22 [2] CRAN (R 4.3.0)
#>  tidyr         1.3.0   2023-01-24 [2] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
#>  tzdb          0.3.0   2022-03-28 [2] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
#>  uuid          1.1-0   2022-04-19 [2] CRAN (R 4.3.0)
#>  vctrs         0.5.2   2023-01-23 [2] CRAN (R 4.3.0)
#>  vroom         1.6.1   2023-01-22 [2] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [2] CRAN (R 4.3.0)
#>  xfun          0.37    2023-01-31 [2] CRAN (R 4.3.0)
#>  xml2          1.3.3   2021-11-30 [2] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [2] CRAN (R 4.3.0)
#>  zoo           1.8-11  2022-09-17 [1] CRAN (R 4.3.0)
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-devel/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-02-02 with reprex v2.0.2

sdgamboa commented 1 year ago

Pasting here a shortened version of the attribute sources name. Hope this helps to visualize the table.

suppressMessages({
  library(bugphyzz)
  library(dplyr)
  library(purrr)
  phys <- physiologies(keyword = 'all', full_source = FALSE)
})

madin_source <- 'https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/tree/master/output/prepared_data'

phys <- phys |> 
  map(~ {
    .x$Attribute_source <- ifelse(
      .x$Attribute_source == madin_source, 'madin_et_al', .x$Attribute_source
    )
    .x
  })
df <- phys |> 
  map(~ count(.x, Attribute_source, Evidence, Frequency)) |> 
  bind_rows(.id = 'spreadsheet') |> 
  arrange(Attribute_source, Evidence, Frequency)
df
#>                              spreadsheet                  Attribute_source
#> 1                      health associated                      Asnicar_2021
#> 2                                  shape                            BacMap
#> 3                         isolation site                            BacMap
#> 4                     growth temperature                            BacMap
#> 5                                habitat                            BacMap
#> 6                    disease association                            BacMap
#> 7                          aerophilicity                            BacMap
#> 8                            arrangement                            BacMap
#> 9                            arrangement                            BacMap
#> 10                            gram stain                            BacMap
#> 11                                 shape                            BacMap
#> 12                     acetate producing                   Barcenilla_2000
#> 13                    butyrate producing                   Barcenilla_2000
#> 14                hydrogen gas producing                   Barcenilla_2000
#> 15                     lactate producing                   Barcenilla_2000
#> 16                         aerophilicity                   Bergey's Manual
#> 17                           arrangement                   Bergey's Manual
#> 18                            gram stain                   Bergey's Manual
#> 19                                length                   Bergey's Manual
#> 20                                 shape                   Bergey's Manual
#> 21                           spore shape                   Bergey's Manual
#> 22                                 width                   Bergey's Manual
#> 23                sphingolipid producing                   Bergey's Manual
#> 24                         aerophilicity                   Bergey's Manual
#> 25                           arrangement                   Bergey's Manual
#> 26                            gram stain                   Bergey's Manual
#> 27                                length                   Bergey's Manual
#> 28                                 shape                   Bergey's Manual
#> 29                           spore shape                   Bergey's Manual
#> 30                                 width                   Bergey's Manual
#> 31                            gram stain                   Bergey's Manual
#> 32                               habitat                       Browne_2021
#> 33                           spore shape                       Browne_2021
#> 34 mutation rate per site per generation                       Gibson_2018
#> 35      mutation rates per site per year                       Gibson_2018
#> 36                sphingolipid producing                      HeaverS_2018
#> 37                               habitat                         Hilt_2014
#> 38                           genome size                              Kegg
#> 39                          coding genes                              Kegg
#> 40                               habitat                             MiDAS
#> 41                         growth medium Microbial Fatty Acid Compositions
#> 42                    growth temperature Microbial Fatty Acid Compositions
#> 43                sphingolipid producing                       OlsenI_2001
#> 44                sphingolipid producing                       OlsenI_2001
#> 45              antimicrobial resistance                            PATRIC
#> 46                         aerophilicity                         ProTraits
#> 47                           arrangement                         ProTraits
#> 48                            gram stain                         ProTraits
#> 49                               habitat                         ProTraits
#> 50                                 shape                         ProTraits
#> 51                       biofilm forming             The Microbe Directory
#> 52                            gram stain             The Microbe Directory
#> 53                    growth temperature             The Microbe Directory
#> 54                               habitat             The Microbe Directory
#> 55             antimicrobial sensitivity             The Microbe Directory
#> 56                            optimal ph             The Microbe Directory
#> 57                   extreme environment             The Microbe Directory
#> 58                       animal pathogen             The Microbe Directory
#> 59                            gram stain             The Microbe Directory
#> 60            COGEM pathogenicity rating             The Microbe Directory
#> 61                   plant pathogenicity             The Microbe Directory
#> 62                        isolation site                       madin_et_al
#> 63                         aerophilicity                       madin_et_al
#>    Evidence Frequency     n
#> 1       igc   usually    30
#> 2             unknown     6
#> 3       exp    always    99
#> 4       exp   unknown   535
#> 5       exp   unknown  1438
#> 6       exp   usually   445
#> 7   unknown    always  1304
#> 8   unknown sometimes    24
#> 9   unknown   unknown  1047
#> 10  unknown   unknown  1278
#> 11  unknown   unknown  1273
#> 12      exp   usually    24
#> 13      exp   usually    24
#> 14      exp   usually    14
#> 15      exp   usually    15
#> 16      exp    always  1229
#> 17      exp    always   142
#> 18      exp    always  1315
#> 19      exp    always    43
#> 20      exp    always   876
#> 21      exp    always    50
#> 22      exp    always   117
#> 23      exp    always     7
#> 24      exp sometimes    24
#> 25      exp sometimes   610
#> 26      exp sometimes    16
#> 27      exp sometimes   620
#> 28      exp sometimes   948
#> 29      exp sometimes    97
#> 30      exp sometimes   734
#> 31  unknown    always     1
#> 32      exp   unknown  1429
#> 33      exp   usually  1388
#> 34      exp sometimes    26
#> 35      exp sometimes    81
#> 36      exp    always     4
#> 37      exp sometimes     9
#> 38      igc   usually  4665
#> 39      igc   usually  4669
#> 40      exp   usually 12999
#> 41      exp    always   254
#> 42      exp   unknown   641
#> 43      exp    always     8
#> 44      exp sometimes     1
#> 45      igc    always 10311
#> 46      igc    always  5076
#> 47      igc    always  5280
#> 48      igc    always  3058
#> 49      igc    always 64153
#> 50      igc    always 12615
#> 51      exp   unknown   426
#> 52      exp   unknown    15
#> 53      exp   unknown  1347
#> 54      exp   unknown  2354
#> 55      exp   usually   832
#> 56      exp   usually   886
#> 57  unknown    always  1874
#> 58  unknown   unknown  1416
#> 59  unknown   unknown  2337
#> 60  unknown   usually  1042
#> 61  unknown   usually  1493
#> 62      exp    always  5316
#> 63  unknown    always  3837
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2022-12-25 r83502)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-02-02
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  ape           5.6-2   2022-03-02 [1] CRAN (R 4.3.0)
#>  assertthat    0.2.1   2019-03-21 [2] CRAN (R 4.3.0)
#>  bit           4.0.5   2022-11-15 [2] CRAN (R 4.3.0)
#>  bit64         4.0.5   2020-08-30 [2] CRAN (R 4.3.0)
#>  blob          1.2.3   2022-04-10 [2] CRAN (R 4.3.0)
#>  bold          1.2.0   2021-05-11 [1] CRAN (R 4.3.0)
#>  bugphyzz    * 0.0.1.3 2023-02-02 [1] local
#>  cachem        1.0.6   2021-08-19 [2] CRAN (R 4.3.0)
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.3.0)
#>  codetools     0.2-18  2020-11-04 [2] CRAN (R 4.3.0)
#>  conditionz    0.1.0   2019-04-24 [1] CRAN (R 4.3.0)
#>  crayon        1.5.2   2022-09-29 [2] CRAN (R 4.3.0)
#>  crul          1.3     2022-09-03 [1] CRAN (R 4.3.0)
#>  curl          5.0.0   2023-01-12 [2] CRAN (R 4.3.0)
#>  data.table    1.14.6  2022-11-16 [2] CRAN (R 4.3.0)
#>  DBI           1.1.3   2022-06-18 [2] CRAN (R 4.3.0)
#>  dbplyr        2.3.0   2023-01-16 [2] CRAN (R 4.3.0)
#>  digest        0.6.31  2022-12-11 [2] CRAN (R 4.3.0)
#>  dplyr       * 1.1.0   2023-01-29 [2] CRAN (R 4.3.0)
#>  ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.3.0)
#>  evaluate      0.20    2023-01-17 [2] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.3.0)
#>  foreach       1.5.2   2022-02-02 [1] CRAN (R 4.3.0)
#>  fs            1.6.0   2023-01-23 [2] CRAN (R 4.3.0)
#>  generics      0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
#>  hms           1.1.2   2022-08-19 [2] CRAN (R 4.3.0)
#>  hoardr        0.5.3   2023-01-26 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.4   2022-12-07 [2] CRAN (R 4.3.0)
#>  httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.3.0)
#>  iterators     1.0.14  2022-02-05 [1] CRAN (R 4.3.0)
#>  jsonlite      1.8.4   2022-12-06 [2] CRAN (R 4.3.0)
#>  knitr         1.42    2023-01-25 [2] CRAN (R 4.3.0)
#>  lattice       0.20-45 2021-09-22 [2] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
#>  memoise       2.0.1   2021-11-26 [2] CRAN (R 4.3.0)
#>  mgsub         1.7.3   2021-07-28 [1] CRAN (R 4.3.0)
#>  nlme          3.1-162 2023-01-31 [2] CRAN (R 4.3.0)
#>  pillar        1.8.1   2022-08-19 [2] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
#>  plyr          1.8.8   2022-11-11 [1] CRAN (R 4.3.0)
#>  purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
#>  rappdirs      0.3.3   2021-01-31 [2] CRAN (R 4.3.0)
#>  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
#>  readr         2.1.3   2022-10-01 [2] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [2] CRAN (R 4.3.0)
#>  reshape       0.8.9   2022-04-12 [1] CRAN (R 4.3.0)
#>  rlang         1.0.6   2022-09-24 [2] CRAN (R 4.3.0)
#>  rmarkdown     2.20    2023-01-19 [2] CRAN (R 4.3.0)
#>  RSQLite       2.2.20  2022-12-22 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.14    2022-08-22 [2] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  stringi       1.7.12  2023-01-11 [2] CRAN (R 4.3.0)
#>  stringr       1.5.0   2022-12-02 [2] CRAN (R 4.3.0)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.3.0)
#>  taxize        0.9.100 2022-04-22 [1] CRAN (R 4.3.0)
#>  taxizedb      0.3.0   2021-01-15 [1] CRAN (R 4.3.0)
#>  tibble        3.1.8   2022-07-22 [2] CRAN (R 4.3.0)
#>  tidyr         1.3.0   2023-01-24 [2] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
#>  tzdb          0.3.0   2022-03-28 [2] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
#>  uuid          1.1-0   2022-04-19 [2] CRAN (R 4.3.0)
#>  vctrs         0.5.2   2023-01-23 [2] CRAN (R 4.3.0)
#>  vroom         1.6.1   2023-01-22 [2] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [2] CRAN (R 4.3.0)
#>  xfun          0.37    2023-01-31 [2] CRAN (R 4.3.0)
#>  xml2          1.3.3   2021-11-30 [2] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [2] CRAN (R 4.3.0)
#>  zoo           1.8-11  2022-09-17 [1] CRAN (R 4.3.0)
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-devel/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-02-02 with reprex v2.0.2

kbeckenrode commented 1 year ago

So, it's ok if the evidence codes and the frequency don't match because they are not dependent on each other. They are independent variables. Does that answer the question? Thanks Samuel! @sdgamboa

lwaldron commented 1 year ago

But is it correct that the microbe directory has multiple values for both frequency and evidence? Can you point to where the microbe directory provided different types of evidence and different frequencies for different microbes?--

Levi Waldron

Associate Professor

Department of Epidemiology and Biostatistics

CUNY Graduate School of Public Health and Health Policy

Institute for Implementation Science in Population Health

55 W 125th St, New York NY 10035

https://waldronlab.io

Join the microbiome Virtual International Forum: https://microbiome-vif.org

kbeckenrode commented 1 year ago

The Microbe directory is only one evidence type, NAS. (not traceable author statement). But the attribute can occur at different frequencies. Like, size is always where optimal pH could be usually, since that has fluctuations to it.

lwaldron commented 1 year ago

Samuel's question was "@kbeckenrode, would you help review that the source, evidence, and frequency are all correct? I see that the microbe directory sometimes has 'unknown' or 'exp' in the Evidence column and 'usually', 'unknown', and 'always' in the Frequency column. Is this correct? Some cells are empty and a similar pattern is found for other attribute sources.

Do we need a unit test for this? If so, how should it look like?"

I'm still unclear what your answer is to this question. It seems to that your answer is, the evidence codes for microbe directory were incorrect, and for other sources you haven't checked the output above for correctness.

For unit tests: frequency and evidence codes should be non-missing and from a list of allowable values. Evidence codes should come from the table of sources that provides "confidence in curation" instead of from the attribute sheets. The old column "confidence interval" should NOT be present.

kbeckenrode commented 1 year ago

@lwaldron Ah ok. I understand now. Ill go ahead and double check the frequency and evidence codes to make sure they are correct. I'll also make sure to remove confidence interval.

sdgamboa commented 9 months ago

Evidence and confidence in curation are reported here: https://github.com/waldronlab/bugphyzz/blob/devel/inst/extdata/attribute_sources.tsv

Frequency values are reassessed in bugphyzzExports, so it should not be a problem.