ufal / ParlaMint-UA

Tools and samples of Ukrainian parliamentary proceedings encoded in ParlaMint format
https://ufal.github.io/ParlaMint-UA/
0 stars 0 forks source link

conversion of downloaded HTML files to a TEI/text file #2

Closed matyaskopp closed 1 year ago

matyaskopp commented 2 years ago
make html2tei-text

parameters of a script that html2tei-text make target calls

this phase result encoding:

place result in Data/tei-text folder filenames should respect the final name convention: ParlaMint-UA_{original filename without .htm suffix}

matyaskopp commented 2 years ago

speaker

pairing speeches and parliamentary members: https://data.rada.gov.ua/open/data/plenary_speech-skl9

date_speech,time_speech,id_mp,name_mp
2019-08-29T13:16:29,31,201,Разумков Д.О.
2019-08-29T13:17:53,66,336,Іоффе Ю.Я.
2019-08-29T13:22:50,22,412,Железняк Я.І.
2019-08-29T13:26:14,15,412,Железняк Я.І.
2019-08-29T14:28:33,2,201,Разумков Д.О.
2019-08-29T14:33:27,3,201,Разумков Д.О.
2019-08-29T14:46:57,75,412,Железняк Я.І.
2019-08-29T14:49:59,18,204,Арахамія Д.Г.

corresponds to: https://data.rada.gov.ua/ogd/zal/agenda/skl9/sten/20190829-1.htm but a chair is missing and some speeches too...

13:29:20

ГРОЙСМАН В.Б.

Дуже дякую, вельмишановний пане головуючий.

this looks like the same data but different format: https://data.rada.gov.ua/open/data/speech_ppz-skl9

<?xml version="1.0" encoding="Windows-1251"?>
<root>
<speech>
<date_speech>
<day_speech>29</day_speech>
<month_speech>08</month_speech>
<year_speech>2019</year_speech>
<hour_speech>13</hour_speech>
<min_speech>16</min_speech>
<sec_speech>29</sec_speech>
</date_speech>
<time_speech>31</time_speech>
<id_mp>201</id_mp>
<name_mp><D0><E0><E7><F3><EC><EA><EE><E2> <C4>.<CE>.</name_mp>
</speech>

agenda

https://data.rada.gov.ua/open/data/plenary_agenda-skl9

date_agenda,id_question,number_question,init_question,name_question
2019-08-29,201908291,0,0,"Урочисте засідання з нагоди складання присяги народними депутатами України, обраними 21 липня 2019 року"
2019-08-29,201908292,0,0,Реєстрація народних депутатів України
2019-08-29,201908293,0,0,Відкриття першої сесії Верховної Ради України дев'ятого скликання

https://data.rada.gov.ua/ogd/zal/agenda/skl9/sten/20190829.htm the agenda title is probably in the text, and it is not possible to link it with data in plenary_agenda-skl9

AnnaParla commented 2 years ago

https://data.rada.gov.ua/ogd/zal/agenda/skl9/sten/20190829.htm the agenda title is probably in the text, and it is not possible to link it with data in plenary_agenda-skl9

Although this file has both "agenda" and "sten" in the name, in fact it is a transcript of a ceremonial sitting, where new MPs take the oath. The word "agenda" is not mentioned in this transcript.