richardpaulhudson / coreferee

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
MIT License
102 stars 16 forks source link

Coreferee does not capture coreferences that are unambiguously evident from the structure of a sentence #11

Closed andytwoods closed 1 year ago

andytwoods commented 1 year ago

Apologies if this is an inappropriate place to ask this question. Is there anyway to bypass Coreferee not capturing coreferences that are unambiguously evident from the structure of a sentence? I feel that this step was taken for reasons of efficiency, and that I may be able to add a flag in a suitable location to achieve this.

With many thanks, Andy

richardpaulhudson commented 1 year ago

Not a problem about asking this question here, although Discussions would probably have been a better place than Issues as this isn't actually about a bug.

Although Coreferee is a general-purpose library, it was written specifically to support the Holmes information extraction library, and the types of coreference it covers were directly determined by the requirements of Holmes. Unfortunately, it isn't the case that such unambiguous coreferences are filtered out; rather, Coreferee's rules and the neural network structures were developed from the start without them in mind. It may or may not be that there are a handful of places in the code that could be changed to include them, but, even if the code change did turn out to be simple, testing that everything still worked as expected would still be a major undertaking.

If you're looking for a coreference solution that captures a wider range of phenomena than Coreferee, it might be worth looking at the Coreference Resolver component within spaCy itself. This uses neither an a priori definition of coreference nor rules to find potential instances of it: it simply learns whatever sorts of correspondences are present in the training data.