nert-nlp / streusle

STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
Creative Commons Attribution Share Alike 4.0 International
63 stars 17 forks source link

Evaluation script that unpacks lextag into remaining STREUSLE columns #41

Closed nschneid closed 5 years ago

nschneid commented 5 years ago

Re: #40, we need a script that takes lextags (full tags, one per token) output by a system and parses them to extract MWE groupings.

Lextags are the 19th and final column in the .conllulex format. Columns 1-10 are UD. Columns 11-18 can be filled in based on UD+lextags.

nschneid commented 5 years ago

Input: .conllulex format except columns 11-18 are blank (not underscores; completely blank)

I think the easiest way to implement this will be to adapt streuseval.py so that instead of VERIFYING that lextags are consistent with columns 11-18, it parses lextags and then populates columns 11-18 in JSON.

Specifically, it needs to:

If we want the output as .conllulex, converting JSON to .conllulex could be a separate script.

nschneid commented 5 years ago

@danielhers I believe I have this working on the lextag-unpack branch. When reconstructing from the gold lextags I can't 100% match the original data file due to an arbitrary numbering issue (#42), but the streuseval score of the original vs. reconstructed is 100%, so there should not be any errors in the reconstruction. Hopefully this means the script is bug-free.