richardpaulhudson / coreferee

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
MIT License
101 stars 16 forks source link

About model performance #35

Open M1A2SEPTUSKII opened 1 week ago

M1A2SEPTUSKII commented 1 week ago

Thank you for your contributions to the NLP field. I would like to know more about 1.4.2 model performance, such as the meaning of Anaphors in 20% and Accuracy (%), as well as how to align the format of the coreferee with the corpus. Since "A mention within Coreferee does not consist of a span", the output of coreferee seems incompatible with the answer key of most corpora ( I tried ontonotes and litbank). For example, the answer key is "Gaza strip”, but the output of the coreferee is ”Gaza”. Thank you!

richardpaulhudson commented 1 week ago

Please read the comments in the last section of https://github.com/richardpaulhudson/coreferee?tab=readme-ov-file#13-background-information. You are right that unfortunately these decisions limit the ability to compare Coreferee results with results from other coreference solutions/libraries, as Coreferee is aiming to solve a slightly different problem.

In the example you give, "Gaza" as opposed to "Gaza strip", depending on the other elements in the chain, may be an example of this "different problem" phenomenon or may simply be an example of non-perfect inference.

M1A2SEPTUSKII commented 1 week ago

Thank you for your reply and I'm sorry for not reading the document carefully. I was wondering if you could recommend any coref taggers that would perform better on noun pairs and based on customizable project such as spacy (I need to customize POS tags). Thanks!