Open wjbmattingly opened 1 month ago
Hi @wjbmattingly, It looks very interesting. Do you have a demo so that I can try it ?
@urchade you got it! https://github.com/theirstory/gliner-spacy/blob/main/examples/gliner_cat/gliner_cat_demo.ipynb I just pushed it to GitHub. You need to clone the repo and then run
python -m pip install .
Thanks, I will try it.
I was also think about training a gliner for zero-shot classification, by framing the task as span extraction
I tried that, but had a hard time consistently working in long spans. Maybe you will have better luck
With some fine-tuning it should work
btw, Ihor (@Ingvarstep) have made a multi-task version of GLiNER: https://huggingface.co/knowledgator/gliner-multitask-large-v0.5
Hi all,
I have been working on a few separate packages attached to GliNER. I have one that may be ready to share. It builds on GliNER spaCy and I could use advice on if you think this would be worth building into GliNER spaCy or packaging it as a separate component. This is a new spaCy component that needs to be added after a GliNER pipe. I am testing this on Holocaust related material.
It works like this. A user defines a set of categories as keys with list values that are GliNER labels. The user data would look like this.
The goal here is to set GliNER to a rather low threshold and use nested spans to capture greater nuance. The new gliner_cat pipe adds up the values from the entities found for each sentence and assigns values to the categories based on this output. One can then process an entire document and identify where salient themes appear by chunking the document into a collection of sentences of n-length.
The component will generate the data and visualization for this.
This works rather like zero-shot text classification with a slight difference. It lets a user define a controlled NER vocabulary that aligns to a topic. This means that when a user wants to understand why certain categories appeared in the text, not only do they know which sentences have those topics, they can point to the specific entities in the sentence that generated that output.
@urchade and @tomaarsen if you like this, would you like to see it as part of GliNER spaCy or as a separate installable spaCy component? It does not have any other requirements except for seaborn for the viz.