Open matyaskopp opened 1 month ago
Currently, the audio alignment follows this structure:
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ab"/> <w xml:id="ps2013-001-01-000-999.u1.p1.s1.w1" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Voc|Degree=Pos|Gender=Masc|Number=Sing|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAFS5----1A----">Vážení</w> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ae"/> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w2.ab"/> <w xml:id="ps2013-001-01-000-999.u1.p1.s1.w2" lemma="paní" pos="NOUN" msd="UPosTag=NOUN|Case=Voc|Gender=Fem|Number=Sing|Polarity=Pos" ana="pdt:NNFS5-----A----">paní</w> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w2.ae"/> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w3.ab"/> <w xml:id="ps2013-001-01-000-999.u1.p1.s1.w3" lemma="poslankyně" pos="NOUN" msd="UPosTag=NOUN|Case=Voc|Gender=Fem|Number=Sing|Polarity=Pos" ana="pdt:NNFS5-----A----" join="right">poslankyně</w> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w3.ae"/> <pc xml:id="ps2013-001-01-000-999.u1.p1.s1.w4" lemma="," pos="PUNCT" msd="UPosTag=PUNCT" ana="pdt:Z:-------------">,</pc> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w5.ab"/> <w xml:id="ps2013-001-01-000-999.u1.p1.s1.w5" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Nom|Degree=Pos|Gender=Masc|Number=Plur|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAMP5----1A----">vážení</w> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w5.ae"/> <!-- ... -->
Every aligned token is wrapped with two anchors :
w/preceding-sibling::anchor[1][ends-with(@synch,'b')]
w/following-sibling::anchor[1][ends-with(@synch,'e')]
This is not very good because it expects specific suffixes in @synch and also the adjected placement.
@synch
So, the proposal is to add a @corresp attribute to the anchor that would point to the corresponding token:
@corresp
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ab" corresp="ps2013-001-01-000-999.u1.p1.s1.w1"/> <w xml:id="ps2013-001-01-000-999.u1.p1.s1.w1" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Voc|Degree=Pos|Gender=Masc|Number=Sing|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAFS5----1A----">Vážení</w> <anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ae" corresp="ps2013-001-01-000-999.u1.p1.s1.w1"/>
Notes:
problem
Currently, the audio alignment follows this structure:
Every aligned token is wrapped with two anchors :
w/preceding-sibling::anchor[1][ends-with(@synch,'b')]
w/following-sibling::anchor[1][ends-with(@synch,'e')]
This is not very good because it expects specific suffixes in
@synch
and also the adjected placement.solution
So, the proposal is to add a
@corresp
attribute to the anchor that would point to the corresponding token:Notes: