Closed jamesturk closed 3 years ago
If people have thoughts/comments any thinking on this would be helpful, there isn't a perfect solution IMO so the trade-offs we'll pick here will have lasting repercussions depending on people's desired usage.
Thanks for a thorough investigation of these issues, James. I certainly see the appeal of approach A from the implementation standpoint. Having an accurate record of committee history is a value that accumulates over time (as well as accumulating costs over time), so it's tough to swallow handling all of the complexity up front when at best that will buy an incremental value a year from now.
D, F, B are my favorite alternatives of the above.
I'm trying to think: what are the history-based insights that we might lose if we go with approach A? and can we think of ways to mitigate that loss or is it just a hard trade-off? Two examples that I've thought of so far:
Current committee assignments is an important source of insight about what policy areas/bills a particular legislator has influence over, and interest in. Historical assignments probably adds value there as well. This overlaps with seniority on a committee (is cmte membership history the only source for seniority knowledge in some jurisdictions?). But I also wonder about times when partisan control of a chamber might cause legislators to get booted from committees, even though those legislators are still interested in (and maybe somewhat influential in) the policy area(s) of the committee.
Mitigation? If a downstream consumer of Open States has a taxonomy of policy areas and maps those to committees, then they could record the relationship between the legislator and policy areas (based on current committee data). Something like { policyArea: 'environment', source: 'committeeMembership', sourceDescription: 'Environment Committee 2020'}
. That relationship could stay around even as committees are destroyed/re-ingested. So basically the tactic is persisting the inference/insight that is based on the committee data, and keeping that.
Less sure about this one. If a bill in a past session was referred to committee A, and then the text of the bill is re-introduced in a current session, it could be useful to recognize that relationship to be able to predict what committee it will be referred to (or just to pre-classify the policy area of the bill).
Mitigation? Similar to above, perhaps downstream consumers could persist the relationship between the past bill and policy areas, rather than relying on a relationship between a past bill and a past committee to still exist in Open States data.
My struggle with D (which I otherwise think is the best option) is how we maintain the list of sessions that a committee is valid for. I'm wondering if you have any thoughts on how you'd want to see us handle that.
The two main challenges I see for D are:
1) how/where/when do we update the mapping? It ideally doesn't belong in a scraper, and there a couple thousand committees across jurisdictions that'll need to now be manually kept up to date. 2) what kinds of guidelines do we need to decide when we use a new committee instead of adding a new session to an old committee? simplest answer would be we typically retire all committees every regular session & create new ones. and then only special sessions need to be updated, but I wasn't sure if that was in line with what you were aiming at
Chiming in from the peanut gallery. Full disclosure I'm not using the OS data for this, so take it as just opinions.
We run committees as one entity across sessions, but with an optional end date. We assume all committees are permanent. If a committee data-load from a particular time comes back without data for that committee, it's flagged for manual review for potential retirement. We spit out a web page with the unmatched committees and their links, and have someone go look up if it's still a live committee. If it's really a dead committee and not a data error, we just end it 12/31/previous_session. This is not perfect data but it's effective in practice, though I suppose you could also use the last session's end_date.
This is generally not hugely cumbersome, it just means a bit of work for each state when the new sessions start up. If a committee has just changed names, we alias it. Whether a "Health" -> "Health & Housing" is a continuation or a new committee is a judgement call.
The biggest downside here for OS is that if a state renames a bunch of committees, there's manual work aliasing them, and trying to keep consistent about continuations.
If I had it to do over again, I might make a committee_session one to many table, with an easy to tool to check/uncheck sessions for a given committee, to avoid all the headaches around date maintenance.
As far as membership, I'd have a committee_person_session table for the same reason, but with an extra end_date date field for deaths/removals/etc.
There's a big 80/20 here around the accuracy -- Is they key fact that rep smith was on budget for the 2020 session, or that they started on jan 11 and ended on may 5?
If they stopped halfway though I think the ideal is to have the committee roster for the 2020 session show all members, but with a special section for anyone who retired/died/etc mid-session, which is equates to "anybody we didn't find in this months data that was there last month, then googled and found out they died."
what kinds of guidelines do we need to decide when we use a new committee instead of adding a new session to an old committee?
I agree that's the key question. Depending on the guidelines, some automation may be possible. Brainstorming some scenarios (and what my first instinct is on how to handle it), assume same legislative chamber in all these cases:
how/where/when do we update the mapping?
Obviously this would carry the cost of writing more tooling, but I wonder if there's a way to do a staging process, something like:
main
/OS DB)main
/OS DB)Doing thousands of committees manually would be pretty painful. Even with the above there will be some manual review and manipulation. What percentage of committees do you wager carry on from session to session with the exact same name? 60%? 80%? If the number is high enough maybe we're only talking about manual review/modify of 10 committees per session per jurisdiction.
@jessemortenson re: automation I think that this misses the desired outcome regarding historical data that Ruby is after, but I'll let her chime in to be sure. In practice, most of the time committee membership between sessions would mostly maintain continuity (like scenarios 1-3 in your examples) but there's a desire/need to have a frozen snapshot of what the committee looked like at the end of a session, so we wouldn't want to mark them as the same committee in those cases.
And I believe it was already decided that If the name changes, it is a new committee.
@showerst thanks for this! In particular
"If I had it to do over again, I might make a committee_session one to many table, with an easy to tool to check/uncheck sessions for a given committee, to avoid all the headaches around date maintenance."
is good to hear, as I think that's where we're likely headed
but there's a desire/need to have a frozen snapshot of what the committee looked like at the end of a session
Gotcha, I think I was assuming under scenario D we weren't accounting for that need. Is the goal to have those committee-per-session snapshots maintained in Open States data (I mean in the head of main
, not just in commit history)? or to provide the ability for downstream consumers to maintain them? It seems like downstream consumers could use something like a committee_person_session
table as Tim mentioned to keep track of the historical snapshots of when people were on a given committee, even as that committee continues from session to session
After some deliberations it has been decided that for now we'll focus on the present, and see what use cases arise that necessitate other options. So essentially option A. We'll have git history to rebuild the past if we want historical data, and Open States will hide legacy committees (in API/os.org) by default, but they may resurface later.
Problem: We want links to committee data to be somewhat consistent, so that a link to a committee on a 2021 bill will be useful when clicked in 2022 despite the committee potentially having had dissolved/been completely overhauled.
It might be important to keep in mind 4 potential places committee data would appear: Case 1. On a bill page, linked from a bill action. In this case we'd really want the committee link to bring the user to the version of the committee closest to what existed at the time of the action. Case 2. On a legislator's page. In this case we'd want to link to the current version of the committee or last version of the committee that the legislator served on. (This distinction is actually pretty hard.) In the multi-committee approaches below this would probably be a lot messier as they'd have multiple (maybe dozens) iterations of the committee tied to them in the database. Case 3. On a standalone committee page. In this case we'd want to be clear about which version of the committee we're showing, so that if a person were looking at an old version that was at least clear. Multiple committee entries here will get complicated as people may be looking at the "wrong" version and not see the data they expect. Case 4. On a page with hearings/etc. This is similar to 3, if a person is looking for hearings but looking at the 2020 version of a committee, that'll be a problem for them.
Considered solutions:
A: Focus on the Present
The current approach.
Committees will not be tied to any time, at any given time looking up a committee would provide the current members. If a committee no longer exists, it will no longer come back in API results/etc. We'd focus on the present, and be limited in what we can show for the past.
Pros:
Cons:
Use Cases: Pretty good for cases 2-4, bad for 1.
B: Tie Committees to Sessions
Committees would be tied to individual sessions. This means we'd need to re-scrape committees every session turnover (which we should already be doing) and have code to automatically expire the old ones. Old committees would be left in place but marked as expired.
Committee membership of expired committees would be frozen in time with whatever the membership was at the final date.
Pros:
Cons:
N*S
where S is the number of sessions. That is more entities to track/more data to worry about.Use Cases: Pretty good for case 1, quite bad for 2-4.
C: Tie Committees to Regular Sessions via dates
A minor iteration on B, we would tie committees to regular sessions using begin & end dates on the committee.
Pros:
Cons:
Use Cases: Pretty good for case 1, good for 2 in regular sessions, bad for 2 in special sessions, still bad for 3-4.
D: Multiple Sessions Per Committee
An alternative to B & C in the same vein, but a committee would now have a list of sessions it was valid for.
Pros
Cons
Use Cases: Pretty good for case 1, OK for 2, still fairly bad for 3-4.
E: Year Based Committees
An alternative to the above that would use years as the boundary for committees for clarity.
Pros:
Cons:
Use Cases: Pretty good for case 1, OK but not great for 2-4?
F: Keep history on committees by member, not by having multiple entries for the same committee
A completely different approach from B-E. We could instead add a begin/end date to each membership, as well as created/dissolved dates on committees. This would allow for a few things:
We can get a point in time view of any committee. So instead of having 2020 and 2022 versions of the Education committee, we'd have one committee, and display of members would vary based upon
This was another approach Open States took (2014-2017ish). It was a ton of work, since the begin/end dates are not published, but instead we rely upon detecting when members change/etc.
There's also the problem of a committee changing its name and whether or not we consider that the same committee or not. That leads to unexpected outcomes if a committee slowly changes name/purpose over time. (Committee on Education becomes committee on Education & Health, which is then renamed to Committee on Health as a new committee on education is formed, etc.)
Pros are that we can represent the true history of a committee, while cons are that working with the data becomes so cumbersome that it is nearly impossible to figure out if anything is correct & code complexity is significant.
Including this here for further discussion, but IMO the least viable option with our current plans/team.
Use Cases: If it worked, satisfies all 4 the best since complete data would be stored and we could figure out different ways to display the data for all 4. Complexity is through the roof though.