Closed bnicenboim closed 9 years ago
Thanks for your report. More gentle failing strategies are on my todo list! The error you saw means that it looks like there are crossing branches somewhere in the sentence annotation (which cannot be rendered as brackets). Unfortunately I can not reproduce this with the sentence you have pasted. Here, it converts just fine to
(VROOT(S(VM Puede)(VM determinar)(S(CjS si)(NP(DD este)(NC equipo))(VM causa)(NP(NC interferencias)(AP(AQ perjudiciales)))(PP(SP para)(NP(DA la)(NC recepción)(PP(SP de)(CNP(NP(NC radio))(CC o)(NP(NC televisión)))))))(CS(VM encendiendo)(CC y)(VM apagando)(NP(DA el)(NC equipo)))(F ;))).
oh, maybe I pasted the wrong tree, I'll check it again tonight
Sorry, this is the tree that made the program crash. Is there something to do or should I just ignore it?
<s id="s1437">
<graph root="s1437_515">
<terminals>
<t id="s1437_1" word="Sobre" pos="SP" morph="--"/>
<t id="s1437_2" word="todo" pos="PI" morph="--"/>
<t id="s1437_3" word="en" pos="SP" morph="--"/>
<t id="s1437_4" word="tiempos" pos="NC" morph="--"/>
<t id="s1437_5" word="difíciles" pos="AQ" morph="--"/>
<t id="s1437_6" word="la" pos="DA" morph="--"/>
<t id="s1437_7" word="cooperación" pos="NC" morph="--"/>
<t id="s1437_8" word="internacional" pos="AQ" morph="--"/>
<t id="s1437_9" word="asume" pos="VM" morph="--"/>
<t id="s1437_10" word="un" pos="DI" morph="--"/>
<t id="s1437_11" word="papel" pos="NC" morph="--"/>
<t id="s1437_12" word="crucial" pos="AQ" morph="--"/>
<t id="s1437_13" word="," pos="F" morph="--"/>
<t id="s1437_14" word="ya" pos="RG" morph="--"/>
<t id="s1437_15" word="que" pos="CjS" morph="--"/>
<t id="s1437_16" word="ningún" pos="DI" morph="--"/>
<t id="s1437_17" word="país" pos="NC" morph="--"/>
<t id="s1437_18" word="del" pos="SP" morph="--"/>
<t id="s1437_19" word="mundo" pos="NC" morph="--"/>
<t id="s1437_20" word="puede" pos="VM" morph="--"/>
<t id="s1437_21" word="afrontar" pos="VM" morph="--"/>
<t id="s1437_22" word="la" pos="DA" morph="--"/>
<t id="s1437_23" word="crisis" pos="NC" morph="--"/>
<t id="s1437_24" word="por" pos="SP" morph="--"/>
<t id="s1437_25" word="sí" pos="PrN" morph="--"/>
<t id="s1437_26" word="solo" pos="AQ" morph="--"/>
<t id="s1437_27" word="." pos="F$" morph="--"/>
</terminals>
<nonterminals>
<nt id="s1437_500" cat="NP">
<edge label="--" idref="s1437_2"/>
</nt>
<nt id="s1437_501" cat="NP">
<edge label="--" idref="s1437_4"/>
<edge label="--" idref="s1437_516"/>
</nt>
<nt id="s1437_502" cat="NP">
<edge label="--" idref="s1437_6"/>
<edge label="--" idref="s1437_7"/>
<edge label="--" idref="s1437_517"/>
</nt>
<nt id="s1437_503" cat="NP">
<edge label="--" idref="s1437_10"/>
<edge label="--" idref="s1437_11"/>
<edge label="--" idref="s1437_518"/>
</nt>
<nt id="s1437_504" cat="MTC">
<edge label="--" idref="s1437_14"/>
<edge label="--" idref="s1437_15"/>
</nt>
<nt id="s1437_505" cat="NP">
<edge label="--" idref="s1437_19"/>
</nt>
<nt id="s1437_506" cat="NP">
<edge label="--" idref="s1437_22"/>
<edge label="--" idref="s1437_23"/>
</nt>
<nt id="s1437_507" cat="AP">
<edge label="--" idref="s1437_26"/>
</nt>
<nt id="s1437_508" cat="PP">
<edge label="--" idref="s1437_1"/>
<edge label="--" idref="s1437_500"/>
</nt>
<nt id="s1437_509" cat="PP">
<edge label="--" idref="s1437_18"/>
<edge label="--" idref="s1437_505"/>
</nt>
<nt id="s1437_510" cat="NP">
<edge label="--" idref="s1437_25"/>
<edge label="--" idref="s1437_507"/>
</nt>
<nt id="s1437_511" cat="PP">
<edge label="--" idref="s1437_3"/>
<edge label="--" idref="s1437_519"/>
</nt>
<nt id="s1437_512" cat="NP">
<edge label="--" idref="s1437_16"/>
<edge label="--" idref="s1437_17"/>
<edge label="--" idref="s1437_509"/>
</nt>
<nt id="s1437_513" cat="PP">
<edge label="--" idref="s1437_24"/>
<edge label="--" idref="s1437_510"/>
</nt>
<nt id="s1437_514" cat="S">
<edge label="--" idref="s1437_20"/>
<edge label="--" idref="s1437_21"/>
<edge label="--" idref="s1437_504"/>
<edge label="CD" idref="s1437_506"/>
<edge label="SUJ" idref="s1437_512"/>
<edge label="CC" idref="s1437_513"/>
</nt>
<nt id="s1437_515" cat="S">
<edge label="--" idref="s1437_9"/>
<edge label="--" idref="s1437_13"/>
<edge label="--" idref="s1437_27"/>
<edge label="SUJ" idref="s1437_502"/>
<edge label="CD" idref="s1437_503"/>
<edge label="CCT" idref="s1437_511"/>
<edge label="AO" idref="s1437_514"/>
</nt>
<nt id="s1437_516" cat="AP">
<edge label="--" idref="s1437_5"/>
</nt>
<nt id="s1437_517" cat="AP">
<edge label="--" idref="s1437_8"/>
</nt>
<nt id="s1437_518" cat="AP">
<edge label="--" idref="s1437_12"/>
</nt>
<nt id="s1437_519" cat="NP">
<edge label="--" idref="s1437_501"/>
<edge label="--" idref="s1437_508"/>
</nt>
</nonterminals>
</graph>
</s>
The problem is the node with id s1437_511 which immediately dominates s1437_3 and s1437_519. s1437_519, however, dominates terminals left of s1437_3, i.e., s1437_1 and s1437_2. This results in crossing branches, and those cannot be represented with standard bracketing format. If you do not want to deal with crossing branches, you will have to either omit this tree, or resolve them. In this tree, you would have to attach, e.g., s1437_508 to s1437_519.
For the moment I have added an option to skip discontinuous trees during brackets output (instead of failing). Use --dest-opts brackets_skipdisco
.
Really cool program!, just that it's crashing when it can't parse a tree. It'll be better if it can just ignore the illegal trees (unless there's a way to fix them).
I think that this is the illegal tree:
I'm new in treebanks, so I'm not sure what's exactly wrong. I don't know if the tree is "fixable" or if it should be ignored.
Bests! Bruno