waldronlab / bugphyzz

Harmonized annotation of microbial physiology
http://waldronlab.io/bugphyzz/
5 stars 5 forks source link

Make changes to antimicrobial resistance spreadsheet? #170

Closed sdgamboa closed 1 year ago

sdgamboa commented 2 years ago
library(bugphyzz)
ar <- physiologies('antimicrobial resistance')[[1]]
#> Finished antimicrobial resistance
head(ar[ar$Rank == 'species', ])
#>   NCBI_ID Genome_ID Accession_ID                           Taxon_name
#> 1     573  573.2399           NA      Klebsiella pneumoniae strain 24
#> 2     573  573.2401           NA      Klebsiella pneumoniae strain 36
#> 3     573  573.2403           NA     Klebsiella pneumoniae strain 523
#> 4     573  573.2404           NA       Klebsiella pneumoniae strain 9
#> 5     573  573.1787           NA Klebsiella pneumoniae strain CST_2_1
#> 6     573  573.2404           NA    Klebsiella pneumoniae strain FH-2
#>                Attribute Attribute_value Attribute_source Evidence Frequency
#> 1 resistance to amikacin            TRUE           PATRIC      igc    always
#> 2 resistance to amikacin            TRUE           PATRIC      igc    always
#> 3 resistance to amikacin            TRUE           PATRIC      igc    always
#> 4 resistance to amikacin            TRUE           PATRIC      igc    always
#> 5 resistance to amikacin            TRUE           PATRIC      igc    always
#> 6 resistance to amikacin            TRUE           PATRIC      igc    always
#>      Rank Parent_name Parent_NCBI_ID Parent_rank Confidence_in_curation
#> 1 species  Klebsiella            570       genus                   High
#> 2 species  Klebsiella            570       genus                   High
#> 3 species  Klebsiella            570       genus                   High
#> 4 species  Klebsiella            570       genus                   High
#> 5 species  Klebsiella            570       genus                   High
#> 6 species  Klebsiella            570       genus                   High

Created on 2022-09-09 with reprex v2.0.2

sdgamboa commented 2 years ago

In the example below, I think the taxid 287 should be annotated with the taxon name 'Pseudomonas aeruginosa' and with the attribute values 'resistance to levofloxacin' (rarely) and 'sensitive to levofloxacin' (usually).

So what I propose is to leave the NCBI_ID empty and change the parent rank with code to species (obtaining it from the current Genome_ID).

library(bugphyzz)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

ar <- as_tibble(physiologies('antimicrobial resistance')[[1]])
#> Finished antimicrobial resistance

## With strains
ar |> 
    filter(NCBI_ID == '287') |> 
    count(NCBI_ID, Attribute, Rank)
#> # A tibble: 2 × 4
#>   NCBI_ID Attribute                  Rank        n
#>     <int> <chr>                      <chr>   <int>
#> 1     287 resistance to levofloxacin species     5
#> 2     287 sensitive to levofloxacin  species    15

## Removing strain names with a regex
ar |> 
    filter(NCBI_ID == '287') |> 
    mutate(Taxon_name = sub('^(\\w+ \\w+).+', '\\1', Taxon_name)) |> 
    count(NCBI_ID, Taxon_name, Attribute)
#> # A tibble: 2 × 4
#>   NCBI_ID Taxon_name             Attribute                      n
#>     <int> <chr>                  <chr>                      <int>
#> 1     287 Pseudomonas aeruginosa resistance to levofloxacin     5
#> 2     287 Pseudomonas aeruginosa sensitive to levofloxacin     15

## The species Pseudomonas aeruginosa is not annotated
ar |> filter(Taxon_name == 'Pseudomonas aeruginosa')
#> # A tibble: 0 × 14
#> # … with 14 variables: NCBI_ID <int>, Genome_ID <dbl>, Accession_ID <lgl>,
#> #   Taxon_name <chr>, Attribute <chr>, Attribute_value <lgl>,
#> #   Attribute_source <chr>, Evidence <chr>, Frequency <chr>, Rank <chr>,
#> #   Parent_name <chr>, Parent_NCBI_ID <int>, Parent_rank <chr>,
#> #   Confidence_in_curation <chr>

Created on 2022-09-09 with reprex v2.0.2

kbeckenrode commented 2 years ago

That makes sense to me. Thank you for catching this.

repo-ranger[bot] commented 1 year ago

⚠️ This has been marked to be closed in 7 days.

repo-ranger[bot] commented 1 year ago

⚠️ This has been marked to be closed in 7 days.

sdgamboa commented 1 year ago

This has been addressed in the following commits: