Closed sdgamboa closed 1 year ago
Note that this leads to duplicated annotations:
suppressMessages({
library(bugphyzz)
library(dplyr)
})
ar <- as_tibble(physiologies('antimicrobial resistance')[[1]])
#> Finished antimicrobial resistance
dup_rows <- which(duplicated(ar[,c('NCBI_ID', 'Taxon_name', 'Attribute')]))
length(dup_rows)
#> [1] 87
Created on 2022-12-06 with reprex v2.0.2
All IDs are merged to the NCBI IDs, which are the ids of our data structure (NCBI tree).
@lwaldron, @kbeckenrode, some PATRIC_IDs belong to the same Taxon. Should the annotations of these PATRIC_ID's be merged into a single entry (row)? The PATRIC_ID field/cell could be
"1733.103, 1733.223"
. Note that we already use this format in some datasets with the Accession_ID column.Example from the PATRIC's website and in bugphyzz (for some reason 1733.105 is missing in bugphyzz, maybe due to updates in PATRIC):
Created on 2022-12-06 with reprex v2.0.2