rst-workbench / rst-converter-service

Convert between different Rhetorical Structure Theory file formats (Python library / command-line tool / web service).
BSD 3-Clause "New" or "Revised" License
8 stars 4 forks source link

dplp->rs3 can't handle Desai and Moldovan (2021) example #7

Closed arne-cl closed 3 years ago

arne-cl commented 3 years ago

Input text:

On the surface, the overall unemployment rate is expected to be little changed from September's 5.3%.
But the actual head count of non-farm employment payroll jobs is likely to be muddied by the impact of Hurricane Hugo, strikes, and less-than-perfect seasonal adjustments, economists said.

DPLP output:

0 1 On on IN case 3 O (ROOT (S (PP (IN On) 1
0 2 the the DT det 3 O (NP (DT the) 1
0 3 surface, surface, NN nmod 44 O (NN surface,))) 1
0 4 the the DT det 7 O (NP (NP (DT the) 1
0 5 overall overall JJ amod 7 O (JJ overall) 1
0 6 unemployment unemployment NN compound 7 O (NN unemployment) 1
0 7 rate rate NN nsubj 44 O (NN rate)) 1
0 8 is be VBZ auxpass 9 O (VP (VBZ is) 1
0 9 expected expect VBN dep 7 O (VP (VBN expected) 1
0 10 to to TO mark 13 O (S (VP (TO to) 1
0 11 be be VB aux 13 O (VP (VB be) 1
0 12 little little RB advmod 13 O (VP (VP (ADVP (RB little)) 1
0 13 changed change VBN xcomp 9 O (VBN changed) 1
0 14 from from IN case 15 O (PP (IN from) 1
0 15 September's september' NNS nmod 13 O (NP (NP (NNS September's)) 1
0 16 5.3%. 5.3%. CD nummod 15 NUMBER (CD 5.3%.)))) 1
0 17 But but CC cc 28 O (SBAR (S (CC But) 1
0 18 the the DT det 21 O (NP (NP (DT the) 1
0 19 actual actual JJ amod 21 O (JJ actual) 1
0 20 head head NN compound 21 O (NN head) 1
0 21 count count NN nsubj 28 TITLE (NN count)) 1
0 22 of of IN case 26 O (PP (IN of) 1
0 23 non-farm non-farm JJ amod 26 O (NP (JJ non-farm) 1
0 24 employment employment NN compound 26 O (NN employment) 1
0 25 payroll payroll NN compound 26 O (NN payroll) 1
0 26 jobs job NNS nmod 21 O (NNS jobs)))) 1
0 27 is be VBZ cop 28 O (VP (VBZ is) 1
0 28 likely likely JJ ccomp 13 O (ADJP (JJ likely) 1
0 29 to to TO mark 31 O (S (VP (TO to) 1
0 30 be be VB auxpass 31 O (VP (VB be) 1
0 31 muddied muddy VBN xcomp 28 O (VP (VBN muddied) 1
0 32 by by IN case 34 O (PP (IN by) 1
0 33 the the DT det 34 O (NP (NP (DT the) 1
0 34 impact impact NN nmod 31 O (NN impact)) 1
0 35 of of IN case 38 O (PP (IN of) 1
0 36 Hurricane Hurricane NNP compound 38 CAUSE_OF_DEATH (NP (NP (NNP Hurricane) 1
0 37 Hugo, Hugo, NNP compound 38 O (NNP Hugo,) 1
0 38 strikes, strikes, NN nmod 34 O (NN strikes,)) 1
0 39 and and CC cc 38 O (CC and) 1
0 40 less-than-perfect less-than-perfect JJ amod 43 O (NP (JJ less-than-perfect) 1
0 41 seasonal seasonal JJ amod 43 O (JJ seasonal) 1
0 42 adjustments, adjustments, NN compound 43 O (NN adjustments,) 1
0 43 economists economist NNS conj 38 O (NNS economists))))))))))))))))))))) 1
0 44 said. said. VBP root 0 O (VP (VBP said.)))) 1

ParentedTree('EDU', ['1'])

rst-converter-service error:

Error: 500: INTERNAL SERVER ERROR
{"error":"<class 'discoursegraphs.readwrite.rst.dplp.DPLPRSTTree'> can't handle input file 'input.ext'. Got: The tree position () may not be assigned to.","traceback":"Traceback (most recent call last):\n File \"app.py\", line 113, in post\n tree = read_function(temp_inputfile.name)\n File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/dplp.py\", line 35, in __init__\n self.add_edus()\n File \"/opt/discoursegraphs/src/discoursegraphs/readwrite/rst/dplp.py\", line 91, in add_edus\n self.parsetree[parent_pos] = u\" \".join(edu_tokens)\n File \"/usr/lib/python2.7/site-packages/nltk/tree.py\", line 172, in __setitem__\n raise IndexError('The tree position () may not be '\nIndexError: The tree position () may not be assigned to.\n"}

It seems that DPLP only found one EDU and the dplp->rs3 converter can't handle that.