Closed nschneid closed 5 years ago
Input: .conllulex format except columns 11-18 are blank (not underscores; completely blank)
I think the easiest way to implement this will be to adapt streuseval.py so that instead of VERIFYING that lextags are consistent with columns 11-18, it parses lextags and then populates columns 11-18 in JSON.
Specifically, it needs to:
If we want the output as .conllulex, converting JSON to .conllulex could be a separate script.
@danielhers I believe I have this working on the lextag-unpack branch. When reconstructing from the gold lextags I can't 100% match the original data file due to an arbitrary numbering issue (#42), but the streuseval score of the original vs. reconstructed is 100%, so there should not be any errors in the reconstruction. Hopefully this means the script is bug-free.
Re: #40, we need a script that takes lextags (full tags, one per token) output by a system and parses them to extract MWE groupings.
Lextags are the 19th and final column in the .conllulex format. Columns 1-10 are UD. Columns 11-18 can be filled in based on UD+lextags.