Evaluation treats multiple categories too leniently

huji-nlp / ucca

Universal Conceptual Cognitive Annotation (UCCA)

https://universalconceptualcognitiveannotation.github.io/

GNU General Public License v3.0

20 stars 20 forks source link

Evaluation treats multiple categories too leniently #91

Open danielhers opened 4 years ago

danielhers commented 4 years ago

Evaluation is by spans, and if there is a non-empty intersection of the categories, then the span is considered correct. This is a problem because parsers can just predict many unary edges or multi-category edges and not be penalized for it: https://github.com/danielhers/ucca/blob/master/ucca/evaluation.py#L102 @omriabnd @nschneid

nschneid commented 4 years ago

One subtlety is that, because F nodes are moved under the root, we are left with superfluous C nodes:

[F The] [H [P [C service] ] ... [D poor] [U ...] ] [F is]

Should they be removed? I.e.:

[F The] [H [P service ] ... [D poor] [U ...] ] [F is]

Scoring P and C separately here (in an edge-based evaluation) would seem inconsistent with the notion of ignoring where F attaches.

danielhers commented 4 years ago

Yes, I think normalization (including C-flattening) should occur again after moving Fs.

nschneid commented 4 years ago

Should moving all Fs be part of normalization? For structures like [S [F the] [C xyz]] it would make it more transparent that xyz is evoking a scene.

nschneid commented 4 years ago

Also: the confusion matrix code should match the F-score computation