ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

parczech2parlamint conversion #110

Closed matyaskopp closed 3 years ago

matyaskopp commented 3 years ago

it is not no-break space, it is probably an issue in conversion to parliament

https://www.psp.cz/eknih/2017ps/stenprot/076schuz/s076029.htm

/opt/tools/shared/ParlaMint/ParlaMint-CZ/ParlaMint-CZ_2020-12-09-ps2017-076-01-001-001.ana.xml:203969:71: error:text not allowed here; expected element "date", "email", "gap", "incident", "kinesic", "linkGrp", "name", "note", "num", "pc", "ref", "time", "unit", "vocal" or "w"

"s[ ]osobou"

204083                       <w xml:id="ParlaMint-CZ_2020-12-09-ps2017-076-01-001-001.u43.p4.s11.w20"
204084                          lemma="s"
204085                          msd="UposTag=ADP|AdpType=Prep|Case=Ins">s</w> <w xml:id="ParlaMint-CZ_2020-12-09-ps2017-076-01-001-001.u43.p4.s11.w21"
204086                          lemma="osoba"
204087                          msd="UposTag=NOUN|Case=Ins|Gender=Fem|Number=Sing|Polarity=Pos">osobou</w>