Open Chanwhistle opened 4 months ago
Hi,
thank you for your interest and sorry for the late reply.
If you have already built the KBs (python -m belb.scripts.build_kbs --dir path/to/dir --cores 20
) then you should be able to load the kb with subset like this:
from belb import AutoBelbKb
from belb.resources import Kbs
kb = AutoBelbKb.from_name(
name=Kbs.NCBI_GENE.name,
directory=path/to/dir,
db_config=./db.yaml,
subset="gnormplus",
)
To get the actual NCBI Taxonomy identifiers of the species in the subset:
from belb.resources import Kbs
from belb.utils import load_kb_subsets
subsets = load_kb_subsets(Kbs.NCBI_GENE.name)
print(subsets['gnormplus']
# [41856, 9986, 10116, 11908, 9606, 3847, 7955, 11676, 4896, 8355, 511145, 8364, 9913, 10298, 7227, 559292, 333760, 9031, 6239, 9823, 10090, 3702]
The subset data is stored in this JSON file in the repository
Hi, I need to find NCBIgene subset of nlm-gene and gnormplus . How did you made subset of those KB?