Closed serenalotreck closed 4 months ago
The KeyError
is caused by the linker trying to access the entity's definition
field, which does not exist in the precompiled taxonomies. You have to add resolve_abbreviations=False
to the linker's config.
Here is a minimal working example:
from taxonerd import TaxoNERD
from taxonerd.linking.linking import EntityLinker
from spacy.tokens import Span
ents = [
"M. inflexa",
"bryophytes",
"homo sapiens",
"A. thaliana",
"Arabidopsis thaliana",
]
taxonerd = TaxoNERD()
nlp = taxonerd.load("en_core_eco_biobert")
doc = nlp(" ".join(ents))
span_idxs = []
for i, ent in enumerate(ents):
if i == 0:
start = 0
else:
start = len(" ".join(ents[:i]).split(" "))
end = start + len(ent.split(" "))
span_idxs.append((start, end))
spans = [Span(doc, e[0], e[1], "ENTITY") for e in span_idxs]
doc.set_ents(spans)
config = {
"linker_name": "ncbi_taxonomy",
"resolve_abbreviations": False,
"filter_for_definitions": False,
}
linker = EntityLinker(**config)
updated_doc = linker(doc)
References to umls
or mesh
should be removed from the docstring. These are artefacts from scispacy, from which the linker code was copied.
Context:
I'm using
EntityLinker
programmatically with the following code:Observed Behavior:
This code fails with a
KeyError
for the NCBI ID's (formatNCBI:XXXX
) on line 134 oflinking.py
.When I try and use the indicated defaults by passing
name='umls'
orname='mesh'
on instantiation instead of usinglinker_name='ncbi_taxonomy'
, as indicated in the docs for theEntityLinker
class, there is noCandidateGenerator
instantiated (meaning that no candidate matches are generated), and passing them aslinker_name
causes an error from theKnowledgeBase
class.Expected Behavior:
cui_to_entity
from theEntityLinker
for NBCI Taxon is a valid key and doesn't throw an errorCandidateGenerator