interactive-cookbook / tagger-parser

Tagger and parser models used on our recipes corpus (data), handled with pre- and postprocessing scripts for data conversion (data-conversions)
0 stars 3 forks source link

Format of converted output files #9

Open kastein opened 2 years ago

kastein commented 2 years ago

When converting the json output files of the tagger or parser to conllu using read_prediction.py, then different recipes are not separated by an empty line and the IDs in the first column do not restart at 1 when a new recipe starts. When converting the output of the tagger the format looked correct when I added the argument --single-sentences in addition to the arguments listed in the main Readme but this option did not work for converting the parser output.

irisferrazzo commented 2 years ago

@TheresaSchmidt, I've just had a meeting with @kastein about this and we figured out that the problem could be solved by adding the function that you wrote for splitting the input data for the tagger (in case there are more recipes in a single input file) by adding white lines also for the parser. Could you also document the arguments that can be used (e.g., --single-sentences)?

This isn't as urgent as other matters (since we usually parse single recipes and not different ones at a time), but we may forget about it in the future. Thank you in advance!