udapi / udapi-python

Python framework for processing Universal Dependencies data
GNU General Public License v3.0
55 stars 30 forks source link

Bugfix: coref must be loaded also from empty nodes #77

Closed michnov closed 3 years ago

michnov commented 3 years ago

doc.nodes iterates only over overt nodes, which results in no coreference information loaded for empty nodes. This fixes it, as iteration over doc.nodes is replaced with an iteration over doc.trees and a nested iteration over tree.descendants_and_empty. Another way to fix it is to introduce a method of the Document class that would do the same.

I have replaced it in two places to ensure that the following actions are performed even for the empty nodes:

  1. load_coref_from_misc: ensures that coreference is loaded from MISC
  2. store_coref_to_misc: ensures that previous coreference-related MISC features are deleted before writing the new ones
martinpopel commented 3 years ago

Thanks a lot for this, @michnov. In the end, I've decided to introduce doc.nodes_and_empty. I am not sure about the name, but it seems such a method/property may be useful.