udapi / udapi-python

Python framework for processing Universal Dependencies data
GNU General Public License v3.0
57 stars 31 forks source link

Recursively searching for non-empty ancestors of enhanced dependencies with empty parents #123

Closed michnov closed 7 months ago

michnov commented 7 months ago

The current solution of deleting enhanced dependencies (deps field) to empty parents fails in validation tests with a few error of the following type:

[(in cs_pcedt-ud-test-blind.conllu) Line 62051 Sent wsj2387-001-p1s47]: [L2 Enhanced unconnected-egraph] Enhanced graph is not connected. Nodes ['34', '35', '36', '37', '38', '39', '40', '41'] are not reachable from any root
Enhanced errors: 1
*** FAILED *** with 1 errors

The reason is that it just deletes the dependencies to empty parents and replaces them with a dependency to "0:root" only if no other has left. However, exceptionally it happens that a dependency to an empty parent is the only connection of a set of nodes to the rest of the tree, making them unreachable from the root after deleting the empty parent.

The suggested solution first checks if the root remains reachable after deletion of all empty nodes. If yes, it just removes the empty-parent dependencies. Otherwise, it tries to replace the dependency to an empty parent with the empty parent's dependencies. If these deps also contain an empty-parent dependency, replacing is recursively repeated to the ancestors until all dependencies point to non-empty parents. If no non-empty parents can be found during this procedure (a cycle is encountered), the problematic dependency is replaced with 0:root.

martinpopel commented 7 months ago

Thanks. I have some suggestions for code improvements, but I need to merge in other commits first. So merging this PR now as it is.