waldronlab / bugphyzz

Harmonized annotation of microbial physiology
http://waldronlab.io/bugphyzz/
5 stars 5 forks source link

Signatures for antimicrobial resistance, length, width #200

Closed jwokaty closed 1 year ago

jwokaty commented 1 year ago

I get warnings when attempting to makeSignatures for antimicrobial resistance, length, and width:

Warning messages:
1: No signatures for antimicrobial resistance (categorical). Returning NULL. 
2: No signatures for length (range). Returning NULL. 
3: No signatures for length (range). Returning NULL. 
4: No signatures for length (range). Returning NULL. 
5: No signatures for length (range). Returning NULL. 
6: No signatures for width (range). Returning NULL. 
7: No signatures for width (range). Returning NULL. 
8: No signatures for width (range). Returning NULL. 
9: No signatures for width (range). Returning NULL. 

When I look at the values in length and width, I see numerical values, ranges (0.2-0.75), or something like (>2). I think getSignatures can't deal with this data right now. I'm not sure why antimicrobial resistance isn't working, but df has zeros at https://github.com/waldronlab/bugphyzz/blob/6d441c01ae7797e12266732db065113df380a2cd/R/makeSignatures.R#L116.

sdgamboa commented 1 year ago

@jwokaty, yes. I assume you're trying to create signatures at the species level? It doesn't work (for now) because there are no species in those datasets. I'll add the ASR code to solve this. Width and length are transformed with code in order to remove '>' characters, etc.:

library(bugphyzz)
phys <- physiologies(
    keyword = c('antimicrobial resistance', 'width', 'length')
)
#> Importing antimicrobial resistance (categorical)
#> Finished antimicrobial resistance
#> Importing length (range)
#> Finished length
#> Importing width (range)
#> Finished width

lapply(phys, function(x) {
    table(x$Rank)
})
#> $`antimicrobial resistance`
#> 
#> no rank  strain 
#>       5   10306 
#> 
#> $length
#> 
#> genus 
#>   655 
#> 
#> $width
#> 
#> genus 
#>   839

Created on 2023-01-05 with reprex v2.0.2

sdgamboa commented 1 year ago

It works with mixed at taxonomic level:

library(bugphyzz)
## Antimicrobial resistance 
sig1 <- 
    makeSignatures(
        keyword = 'antimicrobial resistance', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> 
#> Importing antimicrobial resistance (categorical)
#> Finished antimicrobial resistance
#> 
#> >>> Creating signatures <<<
#> 
sig2 <- 
    makeSignatures(
        keyword = 'antimicrobial resistance', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> Importing antimicrobial resistance (categorical)
#> Finished antimicrobial resistance
#> 
#> >>> Creating signatures <<<
#> 
head(lapply(sig1, head))
#> $`bugphyzz:antimicrobial resistance|carbapenem-resistant Acinetobacter`
#> [1] "Acinetobacter baumannii strain BL06" "Acinetobacter baumannii strain BL08"
#> [3] "Acinetobacter baumannii strain BL11" "Acinetobacter baumannii strain BL15"
#> [5] "Acinetobacter baumannii strain BL19" "Acinetobacter baumannii strain BL20"
#> 
#> $`bugphyzz:antimicrobial resistance|not resistant to beta-lactam`
#> [1] "Streptococcus pneumoniae strain 19F"                                
#> [2] "Streptococcus pneumoniae strain 2842STDY5644294"                    
#> [3] "Streptococcus pneumoniae strain 2842STDY5753625"                    
#> [4] "Streptococcus pneumoniae strain 4041STDY6583227"                    
#> [5] "Streptococcus pneumoniae strain 4041STDY6836169 strain ZA_GPS_SP213"
#> [6] "Streptococcus pneumoniae strain 699-14"                             
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to amikacin`
#> [1] "Klebsiella pneumoniae strain 24"     
#> [2] "Klebsiella pneumoniae strain 36"     
#> [3] "Klebsiella pneumoniae strain 523"    
#> [4] "Klebsiella pneumoniae strain 9"      
#> [5] "Klebsiella pneumoniae strain CST_2_1"
#> [6] "Klebsiella pneumoniae strain FH-2"   
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to aztreonam`
#> [1] "Klebsiella pneumoniae strain 15"  "Klebsiella pneumoniae strain 16" 
#> [3] "Klebsiella pneumoniae strain 18"  "Klebsiella pneumoniae strain 193"
#> [5] "Klebsiella pneumoniae strain 24"  "Klebsiella pneumoniae strain 246"
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to capreomycin`
#> [1] "Mycobacterium tuberculosis 12223"    
#> [2] "Mycobacterium tuberculosis 15-000611"
#> [3] "Mycobacterium tuberculosis 15-000767"
#> [4] "Mycobacterium tuberculosis 15-001267"
#> [5] "Mycobacterium tuberculosis 15-006439"
#> [6] "Mycobacterium tuberculosis 15-007199"
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to cefepime`
#> [1] "Klebsiella pneumoniae strain 132" "Klebsiella pneumoniae strain 16" 
#> [3] "Klebsiella pneumoniae strain 19"  "Klebsiella pneumoniae strain 20" 
#> [5] "Klebsiella pneumoniae strain 24"  "Klebsiella pneumoniae strain 246"
head(lapply(sig2, head))
#> $`bugphyzz:antimicrobial resistance|carbapenem-resistant Acinetobacter`
#> [1] "Acinetobacter baumannii strain BL06" "Acinetobacter baumannii strain BL08"
#> [3] "Acinetobacter baumannii strain BL11" "Acinetobacter baumannii strain BL15"
#> [5] "Acinetobacter baumannii strain BL19" "Acinetobacter baumannii strain BL20"
#> 
#> $`bugphyzz:antimicrobial resistance|not resistant to beta-lactam`
#> [1] "Streptococcus pneumoniae strain 19F"                                
#> [2] "Streptococcus pneumoniae strain 2842STDY5644294"                    
#> [3] "Streptococcus pneumoniae strain 2842STDY5753625"                    
#> [4] "Streptococcus pneumoniae strain 4041STDY6583227"                    
#> [5] "Streptococcus pneumoniae strain 4041STDY6836169 strain ZA_GPS_SP213"
#> [6] "Streptococcus pneumoniae strain 699-14"                             
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to amikacin`
#> [1] "Klebsiella pneumoniae strain 24"     
#> [2] "Klebsiella pneumoniae strain 36"     
#> [3] "Klebsiella pneumoniae strain 523"    
#> [4] "Klebsiella pneumoniae strain 9"      
#> [5] "Klebsiella pneumoniae strain CST_2_1"
#> [6] "Klebsiella pneumoniae strain FH-2"   
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to aztreonam`
#> [1] "Klebsiella pneumoniae strain 15"  "Klebsiella pneumoniae strain 16" 
#> [3] "Klebsiella pneumoniae strain 18"  "Klebsiella pneumoniae strain 193"
#> [5] "Klebsiella pneumoniae strain 24"  "Klebsiella pneumoniae strain 246"
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to capreomycin`
#> [1] "Mycobacterium tuberculosis 12223"    
#> [2] "Mycobacterium tuberculosis 15-000611"
#> [3] "Mycobacterium tuberculosis 15-000767"
#> [4] "Mycobacterium tuberculosis 15-001267"
#> [5] "Mycobacterium tuberculosis 15-006439"
#> [6] "Mycobacterium tuberculosis 15-007199"
#> 
#> $`bugphyzz:antimicrobial resistance|resistance to cefepime`
#> [1] "Klebsiella pneumoniae strain 132" "Klebsiella pneumoniae strain 16" 
#> [3] "Klebsiella pneumoniae strain 19"  "Klebsiella pneumoniae strain 20" 
#> [5] "Klebsiella pneumoniae strain 24"  "Klebsiella pneumoniae strain 246"

## Width
sig3 <- 
    makeSignatures(
        keyword = 'width', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> Importing width (range)
#> Finished width
#> 
#> >>> Creating signatures <<<
#> 
sig4 <- 
    makeSignatures(
        keyword = 'width', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> Importing width (range)
#> Finished width
#> 
#> >>> Creating signatures <<<
#> 
head(lapply(sig3, head))
#> $`bugphyzz:width|0-Inf`
#> [1] "Geothrix"      "Leptospira"    "Methylophaga"  "Solobacterium"
#> [5] "Dechlorosoma"  "Jonesia"
head(lapply(sig4, head))
#> $`bugphyzz:width|0-Inf`
#> [1] "Geothrix"      "Leptospira"    "Methylophaga"  "Solobacterium"
#> [5] "Dechlorosoma"  "Jonesia"

## Length
sig5 <- 
    makeSignatures(
        keyword = 'length', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> Importing length (range)
#> Finished length
#> 
#> >>> Creating signatures <<<
#> 
sig6 <- 
    makeSignatures(
        keyword = 'length', tax.id.type = 'Taxon_name',
        tax.level = 'mixed'
    )
#> >>> Importing dataset(s) <<<
#> Importing length (range)
#> Finished length
#> 
#> >>> Creating signatures <<<
#> 
head(lapply(sig5, head))
#> $`bugphyzz:length|0.2-Inf`
#> [1] "Jonesia"     "Mahella"     "Ammonifex"   "Bauldia"     "Akkermansia"
#> [6] "Bartonella"
head(lapply(sig6, head))
#> $`bugphyzz:length|0.2-Inf`
#> [1] "Jonesia"     "Mahella"     "Ammonifex"   "Bauldia"     "Akkermansia"
#> [6] "Bartonella"

Created on 2023-01-05 with reprex v2.0.2

jwokaty commented 1 year ago

Thanks for the explanation regarding length and width. I should have tried using genus.

Regarding antimicrobial resistance, I don't understand why mixed works but not species or genus. For the signatures, I should use species or genus but not mixed.

jwokaty commented 1 year ago

I'm following up on this issue as I am not getting any results for makeSignatures for the following physiologies.

animal pathogen antimicrobial resistance antimicrobial sensistivity biofilm COGEM extreme environments optimal ph coding genes disease association isolation site plant pathogencity

I haven't checked them all but I know for example that animal pathogen should have results for NCBI_ID/Taxon_name + species, but I get no results:

ps <- bugphyzz::physiologies()
getSignatures(ps$animal pathogen`, 'NCBI_ID', 'genus') 
NULL Warning message: No signatures for animal pathogen (logical). Returning NULL.

In the examples above, mixed levels were used; however, for the .gmt files we produce files that are genus only or species only.

sdgamboa commented 1 year ago

I don't think we'll be using this workflow anymore. Now it's importBugphyzz --> getBugphyzzSigunatures. Thresholds and propagation will be handled now in the bugphyzzExports repo.