tdozat / Parser-v2

An updated version of the Parser-v1 repo, used for Stanford's submission in the CoNLL17 shared task.
47 stars 34 forks source link

Multiword Tokens #16

Closed Domattee closed 5 years ago

Domattee commented 6 years ago

Hello,

we're attempting to recreate some of the results from CoNLL2017 and are having trouble with the output of the parser, specifically that it doesn't handle multiword tokens in the way the Shared Task seems to require.

An example in german would be

raw text: "Zum" Gold: 1-2 Zum 1 Zu 2 dem Parser output: 1 zu 2 dem

The Parser output is missing the multiword token 1-2 Zum and the evaluation script breaks off since the number of tokens between input and gold now differ. Since the parser scored on the task, I'm assuming there's a configuration option somewhere that enables the multiword token output and is disabled by default, but I can't find it.

Thank you for your help

iwkhei commented 6 years ago

Not sure that it's included in the parser, you can make a script that fix this issue