popolo-project / popolo-spec

International legislative data specifications
http://www.popoloproject.com/
99 stars 18 forks source link

How to best record number of seats in a Legislative Organization #99

Open tmtmtmtm opened 9 years ago

tmtmtmtm commented 9 years ago

A very common thing to want to know about a legislature/chamber is how many seats it has.

And whilst it would simple enough to shove a one-off property into the relevant Organization on a single-case basis, it would be nicer if there were a consistent way of doing this (especially as this is the sort of that can change over time, so really needs to be more than just a simple string to allow for dates, etc)

jpmckinney commented 9 years ago

The "full-fledged" approach is to create Posts, and to then count the number of Posts that are valid for a given date. Are you looking for a non-Post solution?

tmtmtmtm commented 9 years ago

Yes — many of the legislatures currently modelled with Popolo aren't using Posts, and I think it would be useful to get this sort of meta-information without switching to Posts becoming a requirement.

Or, even if Posts are being used, you might want to note that the 17th Assembly had 115 seats, rather than the 110 there currently are, but not know what those extra seats were.

jpmckinney commented 9 years ago

Since the number of seats is more closely tied to the legislative period than to the organization, we might consider reflecting that in the model. What do you think of a legislative period having a post_counts property (similar to a vote event's counts property), that looks like:

{
  ... some other event properties ...
  "post_counts": [
    {
      "organization_id": "house-of-commons",
      "value": 338
    },
    {
      "organization_id": "senate",
      "value": 105
    }
  ]
}
tmtmtmtm commented 9 years ago

At first glance I like the idea of having something similar to the count on votes, and I'm probably in favour of connecting this with the period.

I'm slightly confused, though, about what this looks like in a unicameral legislature, where it seems this would require duplicating the organization_id — for example with Greenland, below (which is a great example, as the number of seats changed in every term for almost 20 years...):

         {
          "id": "term/1979-05-01",
          "name": "Inatsisartut 1",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 21,
          ]},
          "start_date": "1979-05-01",
          "end_date": "1983-04-11"
        },
        {
          "id": "term/1983-04-12",
          "name": "Inatsisartut 2",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 26,
          ]},
          "start_date": "1983-04-12",
          "end_date": "1984-06-05"
        },
        {
          "id": "term/1984-06-06",
          "name": "Inatsisartut 3",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 25,
          ]},
          "start_date": "1984-06-06",
          "end_date": "1987-05-25"
        },
        {
          "id": "term/1987-05-26",
          "name": "Inatsisartut 4",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 26,
          ]},
          "start_date": "1987-05-26",
          "end_date": "1991-03-04"
        },
        {
          "id": "term/1991-03-05",
          "name": "Inatsisartut 5",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 27,
          ]},
          "start_date": "1991-03-05",
          "end_date": "1995-03-03"
        },
        {
          "id": "term/1995-03-04",
          "name": "Inatsisartut 6",
          "classification": "legislative period",
          "organization_id": "inatsisartut",
          "post_counts": [{
            "organization_id": "inatsisartut",
            "value": 31,
          ]},
          "start_date": "1995-03-04",
          "end_date": "1999-02-15"
        },
jpmckinney commented 9 years ago

We could make the organization_id optional when embedding PostCounts inside an Event, if it's identical to the Event's organization_id, which will only be the case in a unicameral situation. We similarly make organization_id optional when embedding Posts inside an Organization: http://www.popoloproject.com/specs/#embedded-json-documents

tmtmtmtm commented 9 years ago

Would that be a more general extension of the implied-organization rule from "the ID of the containing document" to "the ID of the organization within the containing document"?

That feels like it could have wider repercussions[1] if it's a general thing, but if it's a one-off thing for here, what about going even simpler, and having the basic case reduce to:

   "post_count": 31

rather than

  "post_counts": [
    {
      "value": 31
    }
  ]

Or are you expecting that there would usually/often be other properties on the count too?


[1] Not necessarily bad ones; it's just the sort of thing that gives me pause.

tmtmtmtm commented 9 years ago

Actually, thinking about that a little more, I suspect that might not really be a good idea. It would make it easier for humans to read and write, but would would make both creating and consuming more difficult for tools. (I don't actually know where Popolo aims to fall along that spectrum...)

jpmckinney commented 9 years ago

There shouldn't be two entirely different structures depending on whether it's bicameral or unicameral. Personally, I don't mind repeating organization_id; it's only repetitive in the unicameral case, and repetition isn't really an problem. Popolo data is more for machines than for humans, but for humans to implement Popolo, they should be able to read it.

Also, a PostCount, on its own outside the context of an Event, would have an event_id. That way you can look up a PostCount by event_id and organization_id and get the value, which matches the question, "how many seats were there in the legislature during the XIVth term?" event_id would be optional when embedded according to the current rules. I think organization_id should not be optional.

tmtmtmtm commented 9 years ago

repetition isn't really an problem

Yes: though it's often a symptom that a concept is wrong or missing.

Here it's not so much the repetition itself that concerns me as much as the reason for needing it: i.e. that for "Event X" with organization_id of "O", the PostCount with event_id "O", can have information about different Organizations.

Presumably those would be restricted to Organizations that have "O" as a parent (perhaps through multiple levels), but there's still something about that seems a little odd. I've nothing better to add at the minute though, other than a gnawing sense that this isn't quite right. I'll think about it some more.

tmtmtmtm commented 9 years ago

A further thought on this: I used Greenland earlier as an example of somewhere that changed the number of seats fairly regularly for a while — but in most countries (including Greenland after that flurry of early changes), this is something that changes very infrequently, so having to repeat it for every legislative period is another bad smell.

Is there the concept of things having a start_event or end_event (or equivalent), rather than just start_date and end_date?

tmtmtmtm commented 9 years ago

I'm also a little concerned about post_count being the name here, as that implies that it's simply a count of how many Posts there are in the Organization. But a legislature can have other Posts that aren't Seats.

jpmckinney commented 9 years ago

I think post_count (or whatever name is chosen) should then be a property of the Organization, and the way of solving the "changes over time" issue is to solve that issue in general in #47

One way to avoid going down that rabbit hole is to allow for different kinds of counts, so it would again require a full object and look like a VoteCount, e.g.

  {
    "role": "Member of Parliament",
    "value": 308,
    "valid_from": "2004-06-08",
    "valid_until": "2015-10-18"
  },
  {
    "role": "Member of Parliament",
    "value": 338,
    "valid_from": "2015-10-19"
  },
  {
    "role": "Clerk",
    "value": 1
  }
}

If we only ever expect to count one role per organization, then we shouldn't take this option.

kaerumy commented 9 years ago

Just some inputs here:

We use full fledged posts approach in Malaysia for MPs in Parliament: http://sinar-malaysia.popit.mysociety.org/api/v0.1/search/posts?q=organization_id:53633b5a19ee29270d8a9ecf

@tmtmtmtm point is valid here, Speaker may not be an MP. There is a Seat count and a Post count which mean different things for us.

I like the role option.

Interestingly we have a very different use case for our Senate. We only have three posts (Senator, President, Deputy President).

So we have a lot of Senators who are appointed for individual terms as they're not elected, and added/dropped quite often. A role count here would be very helpful for us:

role: Senator role: Senator Appointed by King role: Senator Appointed by State of Penang role: President

There are a lot of different role counts, so we could check if any positions are missing at any time.

girogiro commented 9 years ago

What about adding a count property to the Post class? That allows to model not only the total number of posts in an organization but also cardinality of each individual post.

A use case that would benefit from that approach:

There are 200 MPs in Czech parliament that are elected in 14 areas, about 15-25 MPs per area. The "full-fledged" model is to create 200 posts with the proper number of posts for each area. A simpler model is to create only 14 simultaneous posts. However, there is no information about cardinality of those simultaneous posts. The count property would add that information.

The total number of posts in an organization is available as sum of the posts' cardinalities. If you need to model only the total number of posts in an organization, you use a single post (e.g. "Member of parliament") with the proper total count.

Changes in the number of posts in an organization over time would be modeled by creation of a new post for another period:

Posts:

{
  "id": "mp-in-term/1979-05-01",
  "name": "Member of Inatsisartut 1",
  "organization_id": "inatsisartut",
  "count": 21,
  "start_date": "1979-05-01",
  "end_date": "1983-04-11"
},
{
  "id": "mp-in-term/1983-04-12",
  "name": "Member of Inatsisartut 2",
  "organization_id": "inatsisartut",
  "count": 26,
  "start_date": "1983-04-12",
   "end_date": "1984-06-05"
}
jpmckinney commented 9 years ago

@girogiro Hmm, I considered that option in this message: https://groups.google.com/d/msg/poplus/FAAmhwOosns/tVtKqFRexC8J

girogiro commented 9 years ago

IMO, there is no need for a separate PostsSet class when simultaneous post is allowed. The optional count property defaulting to 1 (like weight in Vote) would do the job of modeling the maximum number of persons holding the post.

jpmckinney commented 9 years ago

The problem isn't limited to creating a new class. The post goes into detail on 5 or so use cases, for example, counting vacant seats, which requires new logic if count is added.

girogiro commented 9 years ago

Isn't the logic identical being Post a special case of PostSet with count=1? Older software not knowing about the count property assumes the Post as one seat (empty or occupied depending on the existence of current membership) while a new software will use the new count property (or default 1 if it doesn't exist) and will check all existing memberships when counting the seats (total or the vacant ones).

Actually, the above simultaneous post example contradicts with use case 4:

  1. Enforcing a maximum number of concurrent memberships: A post would never have two memberships that overlap in time, but a PostSet would allow up to the maximum number of overlapping memberships (though the memberships must be checked to > refer to unique people).

IMO, there is no need to distinguish between Post and PostsSet. No different logic is introduced with adding of the count property only a generalization of the current one.

jpmckinney commented 9 years ago

Yes, I was just thinking that after my last message. I need to revisit this issue when I have an opportunity to go through all the discussions and find the concensus.

tmtmtmtm commented 9 years ago

There are definitely advantages to @girogiro's suggestion here, but I'm not convinced that this approach really works that well in the case where the number of seats for an area can change over time — e.g. if Eastville had 7 seats in the 18th, 19th, and 21st Assembly, but 8 seats in the 20th. Having that be either 2 or 4 different Posts seems quite awkward, particularly when it comes to the sorts of queries to do with tracking membership of a Post over time. If we were taking this route, I'd like to see a solution that can leave "Member of Eastville" as a single post, with a way of tracking how many of those there were across different periods.