Closed shmouelsamares closed 1 year ago
The problem here is that the coreference annotation is occurring within the pipeline and thus before the retokenization. You can work around it by re-annotating after the retokenization:
from coreferee.manager import CorefereeManager
ann = CorefereeManager().get_annotator(nlp)
ann.annotate(doc)
While trying to use Coreferee to replace proper nouns with their corresponding references, Coreferee will return the wrong token indexes. This issue only occure if a merge was done beforehand.
I expect "he" to refer to "big bad wolf" I get "small" instead