Dependencies pointing to O-tagged tokens

interactive-cookbook / tagger-parser

Tagger and parser models used on our recipes corpus (data), handled with pre- and postprocessing scripts for data conversion (data-conversions)

0 stars 3 forks source link

Dependencies pointing to O-tagged tokens #11

Open TheresaSchmidt opened 2 years ago

TheresaSchmidt commented 2 years ago

The most recent parsed data has edges pointing to tokens (mostly determina) that are tagged with O, i.e. tokens that shouldn't be in the graph at all because they are not nodes. a) This really shouldn't happen. b) Immediate Todo's:

[ ] Double-check parser evaluation script: does it look at tokens that are not tagged?
[x] Did this also happen with the old parser?
[x] Are there edges pointing to O-tagged tokens in the gold data?
[ ] How are determina annotated in the gold data?

TheresaSchmidt commented 2 years ago

Double-check parser evaluation script: does it look at tokens that are not tagged?

As far as I can tell, the parser evaluation only looks at the HEAD and REL columns - unless @siyutao you changed anything?

TheresaSchmidt commented 2 years ago

Did this also happen with the old parser?

Yes it did; see for example sausage_gravy_3 (token 78 'add') in round2_allennlp08_tagged_parsed

TheresaSchmidt commented 2 years ago

Are there edges pointing to O-tagged tokens in the gold data?

Doesn't seem like it from this script.

TheresaSchmidt commented 2 years ago

Edges to and from O-tagged tokens can be removed in post-processing (easily). We should probably discuss / find out how meaningful they are and it would be nice if the parser didn't generate them in the first place.

TheresaSchmidt commented 2 years ago

(Oops, apparently, this didn't send the other day.) Edges to and from O-tagged tokens can be removed in post-processing (easily). We should probably discuss / find out how meaningful they are and it would be nice if the parser didn't generate them in the first place.

TheresaSchmidt commented 2 years ago

meaningful

E.g. whether we should delete the loose edges connected to non-nodes or make the non-nodes nodes. Currently theloose edges get ignored because scripts like reduce_graph.py look only at phrases with non-O tags to find edges.