Closed tcambon closed 5 months ago
Thanks for this! I'll take a look and merge shortly. I think examples like this will help others!
This is great, thanks! There are ways to retokenize the doc so that entities are treated as a single token rather than a multi-word span.
from spacy.pipeline import merge_entities
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("merge_entities")
Is this what you are trying to achieve? If so, want to modify the example before I merge?
Hi William, Thanks for the quick feedback. Indeed that's the function I was looking for. I improved the example
Thanks so much! Happy to help. I may make a youtube video with the merge pipe. I just realized I never did a video on that.
Provide an example on how to integrate gliner-spacy into a spacy pipeline. In this pipeline I wanted to tokenize my sentences by entities, and by words if they weren't in any entities. If you see potential code optimization feel free to share