Closed mattkallo closed 1 year ago
Hello, sorry for the late response I was on vacation.
Not sure I got the example you shared. this is the NER outputs? if so can you share NER output, coref outputs and desired outputs?
Hi @shon-otmazgin . Thanks for the response.
Those were coreference clusters. Below is more detailed example.
from fastcoref import spacy_component
import spacy
nlp = spacy.load("./models/spaCy-en-large-model")
nlp.add_pipe("fastcoref")
text="Zappos, a subsidiary of Amazon, started its online presence in 2011. It expanded outside of Americas in the year 2020"
doc = nlp(text)
doc._.coref_clusters
[[(0, 31), (40, 43), (69, 71)]]
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Zappos 0 6 ORG
Amazon 24 30 ORG
2011 63 67 DATE
Americas 92 100 GPE
2020 113 117 DATE
There is only one cluster in the example. The cluster head "Zappos, a subsidiary of Amazon," (0,31) has 2 entities Zappos and Amazon. My question was, is there a way to identify "Zappos" as the entity the cluster is referring to, though the cluster head has 2 entities?
So this is like a nested entity? The model should find also nested entities, can you try to run the LINGMESS model to see if it can predict the nested as well?
Regarding the cluster head - cluster head is not well defined. Some will consider one entity as the head and others something different. I usually takes the shortest entity which is also a Proper Noun, but that's my interpretation.
Yes, this is a nested entity. It can appear in many similar cases. "Jane Doe, supervisor of John Doe" etc.. Let me try LINGMESS. I have found a workaround by finding the main entity using dependency parse tree when more than one entity exists.
This is not an issue. Didn't have the discussion option, hence asking this question here.
Is there a way to link the coref clusters to respective entities extracted with NER component? A simple string match/search doesn't work 100%. See the eg. below
Cluster
[{'text': 'Zappos, subsidiary of Amazon,', 'span': (913, 973)}, {'text': 'It', 'span': (2843, 2844)}]
In this case, the cluster entry has 2 entities ("Zappos" and "Amazon"), though it refers to "Zappos". I would like to link entity "Zappos" to this cluster programatically.Thanks