popolo-project / popolo-spec

International legislative data specifications
http://www.popoloproject.com/
99 stars 18 forks source link

Unknown expiry date on Membership #111

Closed tmtmtmtm closed 7 years ago

tmtmtmtm commented 9 years ago

(This may be related to #98, but splitting it out to start with, as there may be a better solution without requiring a new property.)

When gathering data for EveryPolitician, I've encountered an issue a few times where I know that someone was a legislator during a particular legislative_period, and ceased to be so, but at an unspecified date (e.g. the official site lists that Fred Bloggs replaced John Doe, but with no other information).

There are a variety of different ways this could be modelled, but it seems like something that's important enough to have a degree of consistency, so that tool authors don't have to write umpteen different cases for is_current? (or was_active_at(date)) types of queries (which is already a quite complex query, due to things like the dissolution date of an Organization being an implied end_date for all Memberships of it)

jpmckinney commented 9 years ago

For the "Is this membership current?" use case, a status flag could be sufficient - but I'm not expecting people to add a "status": "active" property to every membership (which requires some effort to maintain). In the case of legislative memberships, if a status isn't set, you'd have to compare the legislative period's end date to today, and compare the organization's dissolution date to today.

For the "Is this membership current as of ___?" use case, if you don't have a specific date and the legislative period and organization are both current as of that date, then you can answer "Yes", "No" or "Maybe" - and the "Maybe" would be due to either the membership having a vague end date like "2015" or an inactive status without a date.

In terms of semantics for the first use case, we could use some rules:

  1. If a status is set on the Membership, trust it (even if it conflicts with dates)
  2. Otherwise, if an end date is set on the Membership, trust it
  3. Otherwise, if a legislative period end date is set, trust it
  4. Otherwise, if an organization dissolution date is set, trust it

For the second use case, just start at step 2. What do you think?

Update: I'm also thinking that the status field should only be used if you can't use dates, because the status field only represents current status. If you have dates to support the history of status changes, then you should just use dates, not status fields.

tmtmtmtm commented 9 years ago

In general I'm not a massive fan of things like a status flags. Partially that's philosophical because I prefer 'append-only' formats, where (other than needing to correct false data that should never have been there in the first place), you only ever add new data, rather than editing existing data{1}, and partially it's practical in that a consumer of the data then also needs to know when the file itself was generated (a flag in a 3-year-old file is less likely to still be true than a 3-week-old file{2}).

However, here I think there's a deeper issue in that such a flag wouldn't be enough for historic data. For example, if it's currently the 35th Parliament, and I know that at some point in the middle of the 33rd Parliament Fred Bloggs resigned and was replaced by John Doe. Neither of those memberships is currently active, and so a simple status flag would lose the information of that transition, and without some other way of capturing that, there would be no way to, for example, produce a list of the final members.

{1} Popolo currently isn't quite this, but it's almost possible to treat it as such if you ignore most of the short-cut fields like name and email that can expressed in dated forms elsewhere. {2} this is also true to a certain degree with any old data file, but there's a subtle difference between information simply being missing, and being no longer true.

jpmckinney commented 9 years ago

Satisfying (1) would require dates on a lot of things, and we decided early on (between James Turk and Edmund) that some properties were just not worth the trouble of attaching dates to. I suppose an alternative to objects with dates would be a stack of values, where the top value is the current value, but I've never seen it.

Anyway, as long as all the status flag values are "negative" (one form or another or saying "inactive"), it's not data that needs to be overwritten, and it's not data that produces a more ambiguous interpretation as time passes.

In fact, we already have status flags proposed in a different way with end_event #94, where you can have an end_event without an end date, and the event itself can be without dates. Despite that, it still tells you that the membership has ended.

tmtmtmtm commented 9 years ago

we decided early on … that some properties were just not worth the trouble of attaching dates to.

Is that captured somewhere, or was it in private discussions? I'd be interested in reading back through that if it's public. Further discussion on this point should probably move elsewhere though, as that was more of a philosophical aside than my key objection.

I'd definitely be much more in favour of the approach of saying that a Membership can end with an end_event (although it does require adding another step to the logic of https://github.com/popolo-project/popolo-spec/issues/111#issuecomment-136769516) and that that event can in turn be un-dated.

jpmckinney commented 9 years ago

It was on a phone call. Sounds good to me. Adding a link from #98.

jpmckinney commented 7 years ago

Rolled into #94 taking @tmtmtmtm's last comment as way to move forward on this.