welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Create an XML schema for the motions and parse the motions into the format #400

Open MansMeg opened 1 year ago

MansMeg commented 1 year ago

We have decided to build the motion on the ParlaClarin/TEI format. We must create a schema/XML format for the motion and test it on several motions during the whole period.

BobBorges commented 1 year ago

I'm starting to put together an example xml doc to use with motions. swerik-motion.xml.txt There are a couple points on the mind-map (here) that aren't clear to me:

@MansMeg @fredrik1984 : are either of you able to clarify what these points relate to?

MansMeg commented 1 year ago

I think the XML format looks good!

When Fredrik has given the green light, I think we can pass it on to leif-jöran to see if he have any suggestions. After that I think you could take a random sample of say 40-50 motions (stratified by decade, say 3 per decade) and add them to the format manually, to check that it works well.

fredrik1984 commented 1 year ago

Great!

Yes – the att-satser are at the end of the motion body text.

Yes – rubriker is headings. The most important heading is the one which present the topic of the motions and who is the main MP signer of the motion.

The third is "motionshänvisning" – reference between different motions. This was something that Lotta brought up. The same motion is to be submitted to both chambers (?) but in if a motion is first submitted to the First Chamber then the same motion to the Second Chamber is just a "reference motion", like a note pointing to the other motion in the First Chamber (I think). I guess this is an issue that we need to discuss with the library on how to handle it.

Also – I have a little bit of difficulty in reading/understanding the XML schema, but I am confident in what you are doing! And we can always discuss it on Wednesday and so on.

This also relates to #328

BobBorges commented 1 year ago

Thanks!

att-satser:

Do I understand correctly that it's this part of the motion (block indent): image

If so, I would consider this part of the body (not end matter) and label it as a type of \

element.

rubriker -- headings. The most important heading is the one which present the topic of the motions and who is the main MP signer of the motion.

Do we consider this a single element per-motion? I've looked at a few dozen motions just now and didn't find any with multiple headings. If so, it's already in the proposal -- frontMatter div > head.

Probably frontMatter is a bad type value since there's also a \ tag.

motionshänvisning

already there too: text > front > div > linkedMotions


slightly updated example:

swerik-motion.xml.txt

MansMeg commented 1 year ago
  1. Yes. I think that is a good simple start at least. We might want to change it, but you are right that it is a part of the motion.
  2. Let skip additional headers for now then

Great! Maybe you could prepare a presentation tomorrow about this? Then we could get some additional pair of eyes on it before we start.

fredrik1984 commented 1 year ago

Sounds good!

BobBorges commented 1 year ago

Some update:

open questions:

swerik-motion.xml.txt

fredrik1984 commented 1 year ago

Regarding changing att-satser – let's go with "proposal", which is actually the official term in the riksdag lingo.

Regading motion type – I find these in the official English-Swedish riksdag dictionary: partimotion (party motion), enskild motion (individual private member's motion), flerpartimotion (multi-party motion), fristående motion (independent private member's motion), följdmotion (a private member's motion arising out of a Government bill). Hence, we can use English and refer to this dictionary

BobBorges commented 1 year ago

but isn't "proposal" also a separate document type? We have gotten documents just now with prop_ as a prefix from the riksdag library... or it's 'proposition' (not a poli-sci guy just yet :D)

fredrik1984 commented 1 year ago

Well, not really. You are probably referring to "proposition" which in the official translation is "Government bill". But this is indeed a bit tricky!