It seems that hydrogen-, butyrate-, acetate-, and lactate-producing microbes are annotated from the same source (Barcenilla_2000), but they have different frequency/confidence_interval values. Maybe they should share the same level of confidence/frequency?
library(bugphyzz)
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
phys <- physiologies()
#> Finished acetate producing
#> Finished aerophilicity
#> Finished animal pathogen
#> Finished antimicrobial resistance
#> Dropped 3054 rows with missing Attribute_value from antimicrobial sensitivity
#> Finished arrangement
#> Finished biofilm forming
#> Dropped 4 rows with missing Attribute_value from butyrate producing
#> Finished COGEM pathogenicity rating
#> Finished disease association
#> Finished extreme environment
#> Finished gram stain
#> Finished growth medium
#> Finished growth temperature
#> Finished habitat
#> Finished health associated
#> Dropped 10 rows with missing Attribute_value from hydrogen gas producing
#> Finished isolation site
#> Dropped 9 rows with missing Attribute_value from lactate producing
#> Finished length
#> Finished mutation rate per site per generation
#> Finished mutation rates per site per year
#> Finished optimal ph
#> Finished plant pathogenicity
#> Dropped 9 rows with missing Attribute_value from shape
#> Finished spore shape
#> Finished width
#> Dropped 5 rows with missing Attribute_value from genome size
#> Finished coding genes
barcenilla_2000 <- phys %>%
keep(~ {
"Barcenilla_2000" %in% .x$Attribute_source
})
names(barcenilla_2000)
#> [1] "acetate producing" "butyrate producing" "hydrogen gas producing"
#> [4] "lactate producing"
barcenilla_2000 %>%
map(~ {
if ("Frequency" %in% colnames(.x)) {
count(.x, Attribute_source, Evidence, Frequency)
} else if ("Confidence_interval" %in% colnames(.x)) {
count(.x, Attribute_source, Evidence, Confidence_interval)
}
})
#> $`acetate producing`
#> Attribute_source Evidence Frequency n
#> 1 Barcenilla_2000 EXP unknown 24
#>
#> $`butyrate producing`
#> Attribute_source Evidence Confidence_interval n
#> 1 Barcenilla_2000 EXP usually 24
#>
#> $`hydrogen gas producing`
#> Attribute_source Evidence Frequency n
#> 1 Barcenilla_2000 EXP usually 14
#>
#> $`lactate producing`
#> Attribute_source Evidence Frequency n
#> 1 Barcenilla_2000 EXP usually 15
It seems that hydrogen-, butyrate-, acetate-, and lactate-producing microbes are annotated from the same source (Barcenilla_2000), but they have different frequency/confidence_interval values. Maybe they should share the same level of confidence/frequency?
Created on 2022-06-08 by the reprex package (v2.0.1)