interactive-cookbook / tagger-parser

Tagger and parser models used on our recipes corpus (data), handled with pre- and postprocessing scripts for data conversion (data-conversions)
0 stars 3 forks source link

Errors when processing conllu files with an emtpy line at the end #23

Open kastein opened 2 years ago

kastein commented 2 years ago

I tried to create the recipe and action graphs for the newly parsed recipes of the second round of recipes but encountered two problems which seem to be caused by an empty line at the end of the new conllu files.

  1. When running the recipe_graph.py script to create the (full) recipe graph the empty line causes an IndexError:

    File "C:/.../recipe_graph.py", line 33, in _read_graph_conllu
    id = columns[0]
    IndexError: list index out of range
  2. Converting the newly parsed conllu files with the full recipe graphs into action graph conllu files using reduce_dir_to_action_graphs.py / reduce_graph.py adds an unexpected new line to the files:

    93  over    _   _   O   _   0   root    _   _
    94  top _   _   O   _   0   root    _   _
    95  and _   _   O   _   0   root    _   _
    96  serve   _   _   B-A _   0   root    _   _
    97  immediately _   _   O   _   0   root    _   _
    98  .   _   _   O   _   0   root    _   _
    
    O   _   0   root    _   _

With the conllu files of ARA 1.0 and ARA 1.1 these problems do not occur and the files don't include the empty line at the end. Was there maybe a change in the way the conllu files get created from the output of the parser?

TheresaSchmidt commented 2 years ago

Do: make script(s) robust against empty lines. Note: also find out where the empty line came from.