Closed andytwoods closed 1 year ago
Not a problem about asking this question here, although Discussions would probably have been a better place than Issues as this isn't actually about a bug.
Although Coreferee is a general-purpose library, it was written specifically to support the Holmes information extraction library, and the types of coreference it covers were directly determined by the requirements of Holmes. Unfortunately, it isn't the case that such unambiguous coreferences are filtered out; rather, Coreferee's rules and the neural network structures were developed from the start without them in mind. It may or may not be that there are a handful of places in the code that could be changed to include them, but, even if the code change did turn out to be simple, testing that everything still worked as expected would still be a major undertaking.
If you're looking for a coreference solution that captures a wider range of phenomena than Coreferee, it might be worth looking at the Coreference Resolver
component within spaCy itself. This uses neither an a priori definition of coreference nor rules to find potential instances of it: it simply learns whatever sorts of correspondences are present in the training data.
Apologies if this is an inappropriate place to ask this question. Is there anyway to bypass Coreferee not capturing coreferences that are unambiguously evident from the structure of a sentence? I feel that this step was taken for reasons of efficiency, and that I may be able to add a flag in a suitable location to achieve this.
With many thanks, Andy