shon-otmazgin / fastcoref

MIT License
142 stars 25 forks source link

Ungrammatical output can be produced when resolving coreference due to replacing verbs with nouns #55

Open ksteimel opened 2 months ago

ksteimel commented 2 months ago

Description of problem

It seems that the current coreference replacement code looks to see if there is one noun, pronoun or proper noun in the coreference chain, and if so it goes ahead and does the replacement on all tokens. However, sometimes the model gets things wrong, and the replacement swaps noun phrases in for verbs.

In the example below, "acquired" is relplaced with "This deal"

Minimal working example:

import spacy
from fastcoref import spacy_component
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("fastcoref")
text = "The social media company BlueBird Inc. just acquired an AI company called DataMind Solutions for $4.7 billion. BlueBird wants to use DataMind's innovative to make their social media sites even better and more personalized for users. This deal is making waves in the tech world and could change the way we use social media in the future."
doc = nlp(text, component_cfg={"fastcoref": {"resolve_text": True}})
print(doc._.resolved_text)
"""
The social media company BlueBird Inc. just This deal an AI company called DataMind Solutions for $4.7 billion. The social media company BlueBird Inc. wants to use an AI company called DataMind Solutions's innovative to make The social media company BlueBird Inc.'s social media sites even better and more personalized for users. This deal is making waves in the tech world and could change the way we use social media in the future.
"""