mr-martian / rebabel-format

Python library for interacting with reBabel data files
MIT License
1 stars 4 forks source link

Edge Features #12

Open mr-martian opened 4 days ago

mr-martian commented 4 days ago

Various formats have a notion of edge features, which the current schema doesn't support very well.

In the UD importer, a simple dependency relation is two features:

word 1
  UD:lemma(str): the
  UD:head(ref): 2
  UD:deprel(str): det
word 2
  UD:lemma(str): penguin

but an enhanced dependency becomes a whole separate unit:

word 1
  UD:lemma(str): the
word 2
  UD:lemma(str): penguin
UD-edep 3
  UD:parent(ref): 2
  UD:child(ref): 1
  UD:deprel(str): det

This strikes me as an ugly hack, in addition to probably being highly unintuitive for anyone trying to use the data.

One potential solution is to have a second table of features which links to the relations table rather than the units table. The question then arises of whether they should have a separate tiers table as well, but I think this might be unnecessary (unless we wanted to have edge feature names keyed on the parent and child types).