ufal / treex

Treex NLP framework
33 stars 6 forks source link

Coreference aware of appositions #62

Closed michnov closed 7 years ago

michnov commented 7 years ago

Coreference relation connects mentions that refer to the same discourse entity - they can be considered equivalent. The same usually holds for members of apposition, e.g. Barack Obama, the U.S. president. Quality of coreference resolution is never affected in such cases, no matter whether the resolver labels Barack Obama, or the U.S. president as the antecedent. This pull request changes the way how coreference is retrieved as well as stored if any of the arguments belongs to an apposition. It requires the apposition to be represented in a Prague dependency style.

Retrieval

If get_coref_nodes (or the textual or the grammatical version) is called on any node that belongs to an apposition (apposition root or a member), the call of the function is distributed on all apposition nodes. All the antecedents are collected, but only the apposition members are returned as a result. For instance, if called on an anaphor referring to the example apposition above, both Barack Obama and the U.S. president subtrees are returned. To enable/disable this function use the following call: $anaph->get_coref_nodes({appos_aware => 1/0}) By default, the apposition-aware retrieval is enabled.

Storing

If add_coref_nodes (or the textual or the grammatical version) is called on any node that belongs to an apposition, the coreference link is always made to lead from the apposition root. The same holds for the target node of the coreference link. As a result, there is never a physical link going from or to an apposition member, using this method. For the time being, the apposition-aware storing cannot be disabled.

Questions

There are cases when leading a link to one of the members is more reasonable than to the apposition root. In a Czech example Česko, země, která je..., a good reason to target the relative pronoun která into země, rather than the apposition root, is an agreement of the two aruments in gender and number. What is the most ellegant solution?

With this change, the Tool::Coreference::Utils and the translation scenario must have been adjusted to get more or less similar MT scores.