rst-workbench / rst-converter-service

Convert between different Rhetorical Structure Theory file formats (Python library / command-line tool / web service).
BSD 3-Clause "New" or "Revised" License
8 stars 4 forks source link

rs3->dis/svg conversion can't handle Szeryng example text #6

Open arne-cl opened 3 years ago

arne-cl commented 3 years ago

feng-hirst-2014-result.rs3.txt

curl -XPOST localhost:9150/convert/rs3/dis -F input=@feng-hirst-2014-result.rs3.txt
{"error":"<class 'discoursegraphs.readwrite.rst.rs3.rs3tree.RSTTree'> can't handle input file 'feng-hirst-2014-result.rs3.txt'. Got: ","traceback":"Traceback (most recent call last):\n  File \"app.py\", line 113, in post\n    tree = read_function(temp_inputfile.name)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 57, in __init__\n    self.tree = self.dt()\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 117, in dt\n    return self.root2tree(start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 140, in root2tree\n    return self.dt(start_node=root_nodes[0])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 231, in group2tree\n    return self.dt(start_node=child_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 247, in group2tree\n    nuc_subtree = self.dt(start_node=children['nucleus'])\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 217, in group2tree\n    for child_id in other_child_ids]\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 245, in group2tree\n    sat_subtree = self.dt(start_node=sat_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 178, in group2tree\n    for c in self.child_dict[elem_id]]\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 174, in group2tree\n    subtree = self.dt(start_node=subtree_id)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 134, in dt\n    elem_id, elem, elem_type, start_node=start_node)\n  File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/rs3/rs3tree.py\", line 266, in group2tree\n    assert len(children['nucleus']) == 1\nAssertionError\n"}
arne-cl commented 3 years ago

Problem group2tree expects assert len(children['nucleus']) == 1, but we have:

>>> children
defaultdict(<type 'list'>, {'satellite': ['99'], 'nucleus': ['95', '97']})
arne-cl commented 3 years ago

minimal input:

Szeryng subsequently focused on teaching before resuming his concert career in 1954.

The "Le Duc" was the instrument on which he performed and recorded mostly, while the latter ("King David" Strad) was donated to the State of Israel.

feng-hirst-2014 output:

ParseTree('Elaboration[N][S]', [ParseTree('Temporal[N][S]', ['Szeryng subsequently focused on teaching', 'before resuming his concert career in 1954 .']), ParseTree('Elaboration[N][S]', ['The " Le Duc " was the instrument', ParseTree('Temporal[N][N]', ['on which he performed and recorded mostly ,', ParseTree('same-unit[N][N]', [ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['while the latter', '( " King David "']), 'Strad )']), 'was donated to the State of Israel .'])])])])

feng-hirst-2014-result-minimal.rs3.txt feng-hirst-converter-fail

arne-cl commented 3 years ago

If we convert the original feng-hirst-2014 parser output to a tree,

$ curl -X POST -F "input=@feng-hirst-2014-result-minimal.fh2014" http://localhost:5000/convert/hilda/tree.prettyprint
                                                               Elaboration
                     _______________________________________________|________________
                    |                                                                S
                    |                                                                |
                    |                                                           Elaboration
                    |                              __________________________________|____________________________
                    |                             |                                                               S
                    |                             |                                                               |
                    |                             |                                                            Temporal
                    |                             |                  _____________________________________________|_____________
                    |                             |                 |                                                           N
                    |                             |                 |                                                           |
                    |                             |                 |                                                       same-unit
                    |                             |                 |                                              _____________|____________________
                    |                             |                 |                                             N                                  |
                    |                             |                 |                                             |                                  |
                    |                             |                 |                                         same-unit                              |
                    |                             |                 |                                _____________|______________________            |
                    N                             |                 |                               N                                    |           |
                    |                             |                 |                               |                                    |           |
                 Temporal                         |                 |                          Elaboration                               |           |
        ____________|____________                 |                 |                 ______________|_____________                       |           |
       N                         S                N                 N                N                            S                      N           N
       |                         |                |                 |                |                            |                      |           |
Szeryng subseque          before resuming  The " Le Duc "      on which he    while the latter             ( " King David "           Strad ) was donated to
ntly focused on             his concert    was the instrume   performed and                                                                    the State of
    teaching              career in 1954 .        nt        recorded mostly ,                                                                     Israel .

we see that it has two Temporal relations with different nuclearity: The first one is (N: ... focused on teaching, S: before resuming ...). The second one is (N: on which he performed ... mostly, N: while the latter ... was donated). The while should not be interpreted in a temporal sense, but that's probably not the issue here.

arne-cl commented 3 years ago

The rs3 files has some odd things, like relation names in different casing/spelling in the <relations> section, while in the section, we only find Temporal (and not temporal)

    <relations>
      <rel name="Manner-Means" type="rst"/>
      <rel name="Mannermeans" type="rst"/>
      ...
      <rel name="Same-Unit" type="multinuc"/>
      <rel name="Same-unit" type="multinuc"/>
      <rel name="Temporal" type="multinuc"/>
      <rel name="same-unit" type="multinuc"/>
      <rel name="same_unit" type="multinuc"/>
      <rel name="temporal" type="rst"/>
    </relations>
...
  <body>
    <segment id="5" parent="3" relname="Temporal">Szeryng subsequently focused on teaching</segment>
    <segment id="7" parent="3" relname="Temporal">before resuming his concert career in 1954 .</segment>
    <segment id="15" parent="13" relname="Temporal">on which he performed and recorded mostly ,</segment>
    <group id="17" type="multinuc" parent="13" relname="Temporal"/>
  </body>