waldronlab / bugphyzz

Harmonized annotation of microbial physiology
http://waldronlab.io/bugphyzz/
5 stars 5 forks source link

Different frequency/confidence_interval values for same source and type of evidence #155

Closed sdgamboa closed 2 years ago

sdgamboa commented 2 years ago

It seems that hydrogen-, butyrate-, acetate-, and lactate-producing microbes are annotated from the same source (Barcenilla_2000), but they have different frequency/confidence_interval values. Maybe they should share the same level of confidence/frequency?

library(bugphyzz)
library(purrr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

phys <- physiologies()
#> Finished acetate producing
#> Finished aerophilicity
#> Finished animal pathogen
#> Finished antimicrobial resistance
#> Dropped 3054 rows with missing Attribute_value from antimicrobial sensitivity
#> Finished arrangement
#> Finished biofilm forming
#> Dropped 4 rows with missing Attribute_value from butyrate producing
#> Finished COGEM pathogenicity rating
#> Finished disease association
#> Finished extreme environment
#> Finished gram stain
#> Finished growth medium
#> Finished growth temperature
#> Finished habitat
#> Finished health associated
#> Dropped 10 rows with missing Attribute_value from hydrogen gas producing
#> Finished isolation site
#> Dropped 9 rows with missing Attribute_value from lactate producing
#> Finished length
#> Finished mutation rate per site per generation
#> Finished mutation rates per site per year
#> Finished optimal ph
#> Finished plant pathogenicity
#> Dropped 9 rows with missing Attribute_value from shape
#> Finished spore shape
#> Finished width
#> Dropped 5 rows with missing Attribute_value from genome size
#> Finished coding genes

barcenilla_2000 <- phys %>% 
  keep(~ {
    "Barcenilla_2000" %in% .x$Attribute_source
  })

names(barcenilla_2000)
#> [1] "acetate producing"      "butyrate producing"     "hydrogen gas producing"
#> [4] "lactate producing"

barcenilla_2000 %>% 
  map(~ {

    if ("Frequency" %in% colnames(.x)) {
      count(.x, Attribute_source, Evidence, Frequency)
    } else if ("Confidence_interval" %in% colnames(.x)) {
      count(.x, Attribute_source, Evidence, Confidence_interval)
    }
  })
#> $`acetate producing`
#>   Attribute_source Evidence Frequency  n
#> 1  Barcenilla_2000      EXP   unknown 24
#> 
#> $`butyrate producing`
#>   Attribute_source Evidence Confidence_interval  n
#> 1  Barcenilla_2000      EXP             usually 24
#> 
#> $`hydrogen gas producing`
#>   Attribute_source Evidence Frequency  n
#> 1  Barcenilla_2000      EXP   usually 14
#> 
#> $`lactate producing`
#>   Attribute_source Evidence Frequency  n
#> 1  Barcenilla_2000      EXP   usually 15

Created on 2022-06-08 by the reprex package (v2.0.1)

kbeckenrode commented 2 years ago

Good catch. Fixing now

kbeckenrode commented 2 years ago

Fixed

sdgamboa commented 2 years ago

Thanks!