Closed toltoxgh closed 3 years ago
Interesting question... how do you have the processed docs w/ entities and indices stored?
Check out this ` nlp = en_core_sci_md.load()
negex = Negex(nlp, language = "en_clinical", chunk_prefix=["no"])
nlp.add_pipe(negex)
doc = nlp('your text')
for ent in doc.ents:
print(ent, ent.label_, ent._.negex)
` Now filter your entities based on True and False in ent._.negex. It worked my problem. But it would be difficult for already extracted entities as you have lost the sentence context.
Stored could be for example the texts themselves and the start,end indices of the entities in a csv like file.
Independent of how this is stored, once this info is read/parsed, would there be an option with negspacy to run with this information only and some basic spacy tokenizer/sentence splitter etc., without having to run the whole scispacy again?
I'll leave this open in case anyone has ideas of how to build a spacy doc manually from this format of a cache of data and then running parts of a pipeline. I did some poking around and couldn't see an obvious way forward.
With the caveat of not knowing your use case entirely, I'd venture to guess that it might end up being more work than it's worth to get working the way you want instead of just rerunning and taking the computational hit.
Closing due to lack of activity
Is your feature request related to a problem? Please describe. Can negspacy be used with already identified Entities and their spans through scispacy, by providing them somehow?
Describe the solution you'd like
For instance, scispacy has been already run with its EntityLinker, and umls entities with their the indices have been obtained and stored somewhere.
It would be computationally expensive to run the whole scispacy with negspacy again. Is there a way to only run (sci)spacy with only base spacy functionality like the tokenizer, and provide the full text string, the entities and their indices somehow, so that negspacy can determine the negation status?