Closed MansMeg closed 10 months ago
The Riksdagen Motion format with additions ... we would need to add additional tags to the current XML schema.
AFAIC this is a significant Con. I guess we'd have to spend significant amount of time to decide what gets included and exactly how.
The TEI ParlaClarin format
Can we use this for motions without adding or otherwise fiddling with the specification? It seems to me like it's ideally suited, so +1 for TEI ParlaClarin and -1 for the Riksdagen Motion format.
I'm not sure if the TEI has this in their specifications? @ninpnin might know. My guess is that both schemas require fiddling. Just in different ways. @ninpnin ?
TEI by itself is probably too generic, but ParlaClarin TEI has a lot of what we need -- example. Could you post the photo of the whiteboard after the project meeting where this was discussed? Maybe we could look item by item through the info that we want to encode and see how it would look in ParlaClarin.
This is kind of the prioritization of annotating content in motions:
Here is a photo from the motion workshop in June:
The example you showed was for protocols, not for motions?
Sure. The whole ParlaClarin is about protocols -- from their landing page:
a TEI customisation for annotating parliamentary debates
I have to compare point for point with the photo, but my point is that it seems most if not all what we want to annotate is already implemented in that schema.
Not only we need to add elements, we need to do everything ourselves. There are no schemas, there is no format to store the actual content, no tests that we get for free, no knowledge from working with TEI for years, no documentation, no community to rely on.
AFAIK, ParlaClarin doesn't add any elements of their own to TEI, they just define what each element type means in the context of parliamentary debates. They do only one part for parliamentary debates that we need to do for motions if we decide to extend the Riksdagen Motion format. The elements, the schemas, etc. come with the TEI package.
Are we bound to XML?
I would say yes. We should keep to a few formats as possible.
Another (last) point from me, which is a pro-parlaclarin / con-riksdagxml is about consistency in our corpus -- to me it would look and feel very strange to have such wildly different formats (shema/no-schema, tagsets, attrubutes, doc organization) within the same corpus.
I'm interested what the tech-advisory board says.
Decision made: We go with the TEI ParlaClarin format, but test that we can transform the TEI format to the Riksdagen open data XML format so we don't loose any data
We are now having two different solutions on how to store motions and it is not clear which approach is the best one. Below are the two potential solutions.
The TEI ParlaClarin format The TEI parlaclarin is a TEI XML format for structuring and annotating parliamentary protocols. https://github.com/clarin-eric/parla-clarin/tree/master
Pro:
Con:
The Riksdagen Motion format with additions The Swedish parliament already have a format to store motion that is used by the Riksdagens Öppna data. https://www.riksdagen.se/sv/dokument-och-lagar/riksdagens-oppna-data/dokument/ Download motions as XML in zip files.
Pro:
Con: