Closed lahsuk closed 5 years ago
Hello,
we have very similar issue right now, if not the same actually. Is there a solution for that?
nlp = spacy.load('en_core_web_sm')
nlp_coref = spacy.load('en_coref_sm')
doc = nlp_coref(s.strip())
if doc._.has_coref:
doc = nlp(doc._.coref_resolved)
This is one way of doing it but if it mistakes the coreference, it will be hard to recover the text that has been replaced.
I see. This error is still there in version 4.0. I'll open an issue on SpaCy's github for this.
For now, the simplest solution is just to re-run the neuralcoref pipeline component on the retokenized document after the merges (please also note that the recomended way to do merges has evolved now. Here is a fixed example
import spacy
import neuralcoref
nlp = spacy.load('en_coref_sm')
neuralcoref.add_to_pipe(nlp)
text = nlp("Michelle Obama is the wife of former U.S. President Barack Obama. Prior to her role as first lady, she was a lawyer.")
spans = list(text.noun_chunks)
with text.retokenize() as retokenizer:
for span in spans:
retokenizer.merge(span)
# Re-run NeuralCoref after the merges
text = nlp.get_pipe('neuralcoref')(text)
for word in text:
print(word)
if(word._.in_coref):
print(text._.coref_clusters)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
When the above code is run, it gives the following error: