gordonje commented 7 years ago

Third time's the charm!

My first PR was on datamade's fork of this repo. This second one was on this repo, but from datamade's fork which included some commits to files I never intended to change (probably as a result of my shoddy attempt to squash the commits).

We've essentially borrowed everything from the VIP XML spec that feels useful for the campaign finance use cases. Differences between proposed OCD data types and VIP elements are noted below.

This proposal doesn't include modeling election results (which also aren't in VIP). Will cover that in a future proposal.

@aepton: Saw your comments on the original PR. I've made a few changes you suggest and will run through each of your comments. After that, if you feel like I haven't adequetely address your questions/concerns, do you mind moving those comments over to this PR? Sorry for the redundant work!

@fgregg, @palewire and @dwillis and whoever else: Please give it look!

palewire commented 7 years ago

I think this is a great start. As we await feedback from the OCD crew, we're going to push ahead with a rough implementation in our django-calaccess-processed-data repository. That's where we are currently working to transform and simplify data from CAL-ACCESS, the state of California's jumbled, dirty and difficult campaign-finance database.

aepton commented 7 years ago

This looks really good to me.

gordonje commented 7 years ago

Realized I had a couple of redundant post_id fields, which I have just cleared up. I think we all agree that an OCD Post is an analog to a VIP <Office>, but I'm also thinking now that a RetentionContest should reference a particular OCD Membership, which I take to represent a particular person's tenure in a public office (@fgregg, @jpmckinney, and whoever, lemme know if I have that right).

Seems better if Candidacy contains one record for each time a person ran for a public office, not including the times they were in a recall election. That's I believe I have things modeled currently.

jpmckinney commented 7 years ago

@gordonje Yes, you've got the correct interpretations of Popolo's (OCD's) Post and Membership :)

jpmckinney commented 7 years ago

I'm assuming the PR isn't ready for review, but let me know if it is.

gordonje commented 7 years ago

@jpmckinney It's ready to review. I've added some sample json pulled out of data CCDC has scraped from CAL-ACCESS. We've come pretty far along with a draft of django models implementing what's proposed here and, as a result, I've made a few more minor tweaks to this spec. Looking forward to some feedback.

jpmckinney commented 7 years ago

General notes

OCD is written for international use, but VIP is written for the US only. It will require a fair bit more work to transform this proposal into something for international use.

If you just want to have US models, then I propose implementing this as a new Python package, that looks like and interoperates with python-opencivicdata-django, in which classes are prefixed with VIP, to avoid clashing with any eventual internationally-usable elections models.

Prior to realizing this, I noted below several changes I would make to achieve a closer alignment between this proposal and existing relevant international standards. Classes I still have to review are CandidateContest and Candidacy.

Anyway, let me know if all you want is a straight Python/JSON version of VIP, in which case most of my comments may be irrelevant.

Separately, I'll also note here that OCD's Event class probably has too many properties that not all subclasses should share (e.g. contests don't have agenda items).

Small edits

Add definitions for all classes, e.g. to explain what is a contest.
Rename ContestBase to Contest – same for BallotSelectionBase. In Python it might become an abstract ContestBase class, but a public schema wouldn't normally have a Base suffix.
Cut the content like "No important differences between corresponding fields." "No other OCD fields not implemented in VIP." "No other VIP fields not implemented in this OCDEP." The double negatives are hard to parse. The default assumption is that, if you're reusing an upstream standard, then you haven't changed it. These sections should only note the changes. Readers will assume that anything not mentioned is unchanged. See OCDEP 5 for example.

Election

administrative_org_id: Avoid abbreviations. Schema.org has an organizer property with the same semantics.
Countries without states have elections. Why not use division_id instead of state? We may want to add division_id to Event itself – or relate Events to full objects like Areas in Popolo, which can hold the division ID.
Similarly, is_statewide would need to be renamed.

BallotMeasureContest

Why pro and con instead of the more readable yes and no?
ballot_measure_type: We typically use the term classification, and we never prefix property names with class names, because it's redundant. We don't have Event.event_name for example.
other_type: I suggest removing it and just allowing ballot_measure_type to be an open codelist. I don't see an advantage to pushing the non-standard codes into a new field.

PartySelection

Can you provide the justification for the change to PartySelection?

Big edits

We shouldn't be advising people to throw specific things into extras. If we later decide to promote one of those fields to the main class, it will be a nightmare to transition. Either don't mention the field at all, or put it on the main class.
Can you merge the VIP differences discussion into each class' documentation? It's hard to keep jumping up and down the page or scrolling two tabs, and it just makes sense for readers to get all the docs about a class in one place.

Party

Parties are already modeled as organizations in OCD (and Popolo). We can create a subclass if appropriate.

Contest

Shouldn't Contest also subclass Event (and use the same id prefix)? They are events just like elections. In which case, Event should gain super_event from Popolo (superEvent in Schema.org), which would replace election_id.
These contests are starting to look similar to VoteEvent, which OCD adopted from Popolo, and Motion from Popolo. We should at a minimum re-use the same properties where possible. passage_threshold is requirement and full_text is text in Motion, for example; summary_text is summary in Popolo's Event (and in iCal).
effect_of_abstain may need to be modeled out differently. Some countries have multiple types of abstentions with different effects. We can model this as a list of objects, one for each vote option, with another property to describe the effect. We can also store 'statements' here, though we will usually only have statements for yes and no.

palewire commented 7 years ago

@jpmckinney Thank you for this thoughtful and thorough comment. We will review your requests and be back soon with an update.

gordonje commented 7 years ago

@jpmckinney Thanks again for your thorough notes. You've picked up on several of the key challenges we ran into during the drafting process.

We want to continue pushing toward merging this spec into the core OCD API. To that end, we'll work to remove any U.S. specific features or cast them in more general terms. However, the CCDC and I believe several other collaborators here mostly have U.S. use cases in mind. So we might require on-going guidance and/or other collaborators to keep us check on that.

I've just committed changes that address the majority of your notes, plus a few other small related changes, specifically:

[x] Added a Definitions section for key terms and more detailed definitions for each data type.
[x] Merged the VIP differences discussions into the Implementation section, edited out unnecessary content and suggestions about throwing specific stuff into extras.
[x] Removed Base suffix from class names.
[x] Removed abbreviations from property names (e.g., administrative_organization_id).
[x] On Election:
- [x] Removed subclassing of Event and added simple name and date properties (discussed below).
- [x] Replaced state and is_statewide on Election with division_id.
[x] Renamed pro_statement and con_statement on BallotMeasureContest to support_statement and oppose_statement. I think this is an improvement even over yes and no since the text of the ballot measure selections might also be something like recall and don't recall.
[x] Replaced the enum field ballot_measure_type with the simple string classification. Removed other_type.
[x] On BallotMeasureSelection:
- [x] Renamed passage_threshold to requirement
- [x] Renamed summary_text to summary
- [x] Renamed full_text to text
[x] Removed is_top_ticket from Candidacy.

Here are the points which require further discussion, which I have also added to the "Questions" section:

Should `Election` subclass `Event`?

This was one of the original premises for this proposal, having been previously floated in this thread. What's intuitive about this is that we're used to thinking of elections in terms of when they will or did occur, and that mental picture looks like one or more specific dates on a calendar.

The further I've gone into this, however, the less elections seem to fit the mold of events as described in OCD, Popolo or Schema.org. These existing specifications strike me as being more focused on an event as a meeting or some other appointment on one's calendar, with a specific state and end time, location and attendee list.

To me, though, an election is more like an observed date, holiday or other calendar item one would set as "all day" because there is no specific start or end time ("all day" is one of the properties proposed for OCD Events, which is why we preferred to subclass that data type). Elections also don't happen in one specific location, but rather lots of places at once.

If we were inclined to model all the fine details of when and where people can vote in a specific election, perhaps modeling an election as a collection of related events would work. But I, for one, I'm daunted by the potential requirement of mapping out those relationships/hierarchies of events while juggling references to to contests and divisions and accommodating edge cases in places like Washington State with vote-by-mail only.

That level of detail about where and when someone can vote might be more important if our goal were merely to represent the next election (similar to VIP), as opposed to historical elections for which that data might be difficult to come by.

Rather, a general date field coupled with a division seem quite sufficient for our use cases while avoiding ambiguity about what belongs where. The use cases for campaign finance and (eventually) election results are more focused on the potential and actual outcomes of each election than specific times and places people might have voted.

If/when this specification does need to allow more finely detailed modeling, then perhaps the date field on election could swapped out in favor of a reference to an Event or Calendar object that could contain subevents for each place and time that voter might have voted, if needed.

Should `Contest` subclass `VoteEvent`?

All the reasons Election might not want to subclass Event also apply to having the contest-related classes subclass VoteEvent. It would also introduce additional irrelevant properties like legislative session and vote (because we can't know exactly when or how voters voted in any contest). It also would allow varying dates for each contest within an election, which shouldn't happen.

Should `Party` subclass `Organization`?

This makes a lot of sense, but our concern (born mostly of ignorance) is how internal structures of political parties tend to be modeled using OCD Organization. Forgive the topical humor, but I feel strongly that Republican legislators in Missouri should be lumped in with Donald Trump.

OCD allows precision in modeling, for example, the RNC organization distinct from the state parties and county-level organizations and associating persons as members to any of those organizations. But our concern is that allowing Candidacy and CandidateContest objects to reference any level of the party structure will frustrate the most common forms of analysis, which expect all the Republicans to be grouped with all the Republicans and all the Democrats to be grouped with all the Democrats, etc.

Would it make sense for OCD to have, as it does with divisions, a central repository of political parties that users are encouraged to use instead of rolling their own? That would certainly alleviate this sort of grouping problem that might arise, for example, in a cross-state analysis of election results. And if we had that centralized tightly controlled repo, we would be less concerned about modeling parties as orgs or some subclass of orgs.

jpmckinney commented 7 years ago

Thanks!

Party

I'm not familiar enough with party systems in the US to know whether a state Republican party is the same as federal, county, city, etc. - but let's say they are distinct (like they are in Canada, except for the NDP which is fully integrated). In that case, what you describe isn't tracking party membership as much as political alignment. We could model political alignment as a new property/class, which is the cleanest solution I can see. The alternative is to have a world that gets modeled differently depending on what sort of analysis you want to perform - which is strange. But, you could just make the decision in your data that there is only one Republican party and all Republicans are members of it - but your data won't be interoperable with another system whose model is closer to reality.

But if the Republican and Democratic parties are the same at all levels, then they should be modeled as being the same and not jurisdiction-specific.

All that said, having a Party class distinct from Organization won't save you - there's no central authority preventing people from creating lots of Democratic parties :)

Election

I don't think making an Election an Event commits us to creating objects for every voting place. Independence Day is an event that appears in many people's calendars, yet it is all day (which is not the same start/end across time zones) and it happens nowhere or everywhere.

There are relatively few primitives in the world. I don't think elections are so fundamental that they ought to be a root class. In Schema.org (and Popolo), by virtue of reusing RDF vocabularies, every class falls into a hierarchy, the root of which is Thing. OCD doesn't define all those superclasses, because there are no use cases for them. But considering an Election is an Event (in the sense of Schema.org and the standards that Popolo reuses), we should put it into that hierarchy. Popolo is considering properties like start_event and end_event on Membership, which can point to an Election, Resignation, Nomination, etc. It wouldn't make sense for the range of those properties to be the union of various event-like classes; semantically, it makes more sense for them all to be descendants of Event.

Now, in OCD, we have an Event class that, as mentioned, is more like a Meeting because it has an agenda. Schema.org Event (which Popolo reuses) is more generic. So, in OCD, to preserve backwards-compatibility, we could have an abstract base class with descendants Event and Election.

My main concern is if we instead add more primitives, we risk having schemas drift instead of reusing the same properties. As the modeling progressively covers more objects in the real world, it will also become harder to learn the models if they aren't building on each other through class inheritance.

Contest

Yeah, on second thought, Contest is not even an Event. It's closer to a Motion in Popolo, but in any case it'd be a sibling of Motion, not a parent/child.

gordonje commented 7 years ago

Great dialogue, @jpmckinney

Party

I feel like the modeling of political alignment you mentioned adding is exactly what Candidacy.party_id is meant to represent. That is, which party was the candidate affiliated with/endorsed by while campaigning for office, which may differ from the party with which that person caucuses as an office holder or which party they've joined as registered member (presumably these are cases for making use of Membership).

So we'll end up having several different ways that a person can be related to an organization that is a party, and that the party might actually be multiple objects in the database with varying levels of specificity. Which is probably the right level of flexibility, and I shouldn't be too worried that smart people won't know what they're doing.

In case it's illustrative, I think our project would end up with, for example, many different versions/levels of Democratic Party (as national, state and county-level organizations) which would be distinct campaign finance organizations and an additional "Democratic" political party organization to which candidates would be connected. Not sure whether or not we would need to map out the relationship/hierarchies between these organizations.

All of that to say, I think we'll make Party a subclass of Organization with the following additional properties:

abbreviation
color
is_write_in

Election

If adding an abstract base Event class to OCD is on the table, then maybe we can make this work. Here are the properties I think each class should have:

EventBase
- required
  - id
  - name
  - classification
  - start_time
  - all_day
  - timezone
  - created_at
  - updated_at
- optional
  - end_time
  - sources
  - extras
Event
- optional
  - description
  - jurisdiction
  - location
  - participants
  - documents
  - media
  - links
Election
- required
  - division_id (or should jurisdiction be on the base class?)
- optional
  - administrative_organization_id (or should we add organizer_id to the base class and allow person or org IDs?)
  - identifiers

Contest

The only overlap I'm seeing between Motion and what's currently proposed for Contest is they need a requirement property to describe the threshold of votes needed. But I also see potential overlap in terms respresenting the results and linking to counts of votes as we get into handling election results as well.

I also note that OCD's Votes proposal hasn't adopted Motion yet and has a Bill data type that isn't part of Popolo.

Maybe we could follow the same tack as described above for Election and add a new base abstract class from which both election contests and legislative vote occurences (aka, VoteEvent instances) should inherit.

VoteEvent is described in Popolo as being a subclass of Event, but doesn't appear to be implemented that way in OCD. Seems the closest analog to VoteEvent I can find in schema.org is VoteAction which is part of the Action class hierarchy, rather than the Event class hierarchy.

Here's a stab in the dark:

VoteContest as a base abstract class with properties like

name
event_id instead of start_date and end_date, and this could reference an Election, legislative committee meeting or whatever
counts
result (or possibly each subclass will have their own variant of this)

Then MotionContest and the other proposed subclasses for BallotMeasureContest, CandidateContest, etc. would all inherit from this.

Feel like I'm coming in here tossing around drastic changes to things. Didn't originally intend that. I'm sure there's plenty of important background I'm lacking. For example, maybe you could say more about the rationale, in Popolo, for VoteEvent to subclass Event.

gordonje commented 7 years ago

Also with respect to contest: Maybe we should also stick with the term "option" which Popolo and OCD already use instead of adopting VIP's term "selection". I think these are one-and-the-same, especially if at some level we're trying to conflate election contests and legislative votes.

gordonje commented 7 years ago

I've reverted Election to be a subclass of OCD Event. I think the only thing that's not ideal about this approach is that Election inherits the following properties that aren't likely to be used:

description
jurisdiction
location
participants
documents
media
links
status

However, this seems like a problem to be handled in another OCDEP focused more narrowly on the Event data type and customizations for different subclasses. What I think we all agree on right now is that Election should inherit from whatever is the current base Event class. Let me know if that needs to be spelled out more explicitly somewhere in this proposal.

That leaves one last major sticking point: Whether what's currently named Contest and its subclasses can be based on something that already exists in Popolo or OCD. I suspect the overlap between Contest and VoteEvent (or something like it) will become more clear as we start to deal with the results of the election contests (i.e., the counts for each selection/option and the outcome). That's something we're committed to continue speccing out, but we previously decided to hold it over for a future supplemental proposal.

fgregg commented 7 years ago

This is tremendous work @gordonje!

Things are looking very good to me.

The only thing that's a little puzzling to me are the BallotSelection objects. It seems that we will need this when we want to model concrete ballots that voters see. We do not need them for campaign finance models.

Let's wait until a user emerges who wants to model actual ballots.

jdmgoogle commented 7 years ago

Hi, I'm Justin Moore, the engineer from Google who oversaw the creation of the VIP 5.x release and worked with NIST on the 1500-100 election spec on which some of the VIP 5.x elements are based. I just became aware of this thread and was hoping to add a bit of background information and context to some of the the discussions.

OCD is written for international use, but VIP is written for the US only.

It's true that VIP is only used within the US, but the elements describing elections, contests, candidates, ballot measures, etc, come from the NIST spec, which should be compatible with international elections. If there are certain situations where the schema does not work for a particular election, please let us know.

Why pro and con instead of the more readable yes and no?

Many jurisdictions don't use "yes" or "no" and have more generic or alternative ways of indicating support or opposition. These include yes/no, support/oppose, pro/con, approve/reject, and a few others. We settled on "pro" and "con" since sometimes voting "yes" on something is actually voting against a certain proposal (Florida, California, and a few others are fun that way).

other_type: I suggest removing it and just allowing ballot_measure_type to be an open codelist. I don't see an advantage to pushing the non-standard codes into a new field.

Replaced the enum field ballot_measure_type with the simple string classification. Removed other_type

Our experience indicates that a free-form string renders the field effectively unusable, in that even common-case scenarios become custom. What ends up happening is that the major feed producers end up striking undocumented agreements with the major feed consumers as to magic values that end up in those fields. E.g., "by 'house' I mean 'the lower house'" and this ends up leaving states with "assemblies" to gladly do their own thing but all of a sudden it's hit-and-miss. We settled on the Type/OtherType semantics as a way to encourage feed producers and consumers to adhere to some best practices while forcing them to explicitly acknowledge when they're going "off-book" to create something custom to them.

full_text is text in Motion, for example; summary_text is summary in Popolo's Event (and in iCal)

Just as a heads-up this is often a nightmare to try and standardize, even within the United States. We've seen some summary_text fields that are two paragraphs long (because when the actual text is three pages, two paragraphs are a summary). I don't have any good answers here, I just wanted to flag this. If you end up with any good rules of thumb and guidelines to encourage best practices, please let us know. :)

Some countries have multiple types of abstentions with different effects.

Interesting. Could you provide an example?

gordonje commented 7 years ago

Per @fgregg's feedback and ongoing discussions in PR #79, I'm editing out the BallotSelection class hierarchy and other properties related to representing exact details of varying election ballots.

I'm also attempting to address something else that's been nagging at me, having to do with how we define a distinct candidacy.

Up to this point, I had been sticking to VIP's rule about candidate objects not being shared between contests. Certainly we want separate records for a person who runs for two different offices in the same election. And we want separate records for a person's initial and subsequent re-election attempts to the same office.

I think its more ideal, though, to allow a single candidacy to be related to multiple candidate contests to model:

A person who runs for a specific office in a primary election and, later, a general election.
The U.S. presidential election (or something else like it) which is actually, not one, but 50 contest in each state, where the same people are competing for the same offices.

In the case of the latter, having a single object for Donald Trump's presidential candidacy is superior to having 50 copies of Donald Trump 😬, especially when you are trying to connect his candidacy to the campaign finance committees actively engaged in the presidential election.

So to define a distinct candidacy, I propose something like: person -- office term, where office term is a post with an expected start date and end date. Kinda similar to VIP's Office Term, except OCD's offices (aka, posts) are independent of any election and have mulitple terms. @jdmgoogle can maybe check me on this (also...hi! and welcome 🎉 )

gordonje commented 7 years ago

Just to be clear on a couple of points:

Not trying to force the modeling of multiple contests for the U.S. presidential race. This would allow a user to implement either one or multiple, depending on their desired level of detail.
Office term is meant to represent an interval in which winning candidate is expected to hold the office, not how long he/she is actually in that office. We got some properties on Membership for that.

jdmgoogle commented 7 years ago

One of the motivating factors behind the separation of Person, Candidate, and CandidateSelection is the distinction between the (a) the party with which the person identifies, (b) the party that is supporting their candidacy, and (c) the party that shows up next to their name on the ballot. For example, if Bernie Sanders had won the Democratic nomination, he would be (a) an Independent per self-identification, (b) a Democrat for the purposes of the candidacy, and (c) likely shown on the ballot in some states next to other parties (e.g., here's the Otsego County, NY sample ballot indicating he probably would have shown up endorsed by the Working Families and Women's Equality parties).

If that's not an issue for you to track (the person versus the candidacy versus ballot endorsements) then you may be able to collapse per-state candidacies into a national candidacy. But if you need to track per-party spending limits or associate contributions on a per-state-campaign basis, this is one scenario of which you should be aware.

fgregg commented 7 years ago

@jdmgoogle I think I understand the important distinction between Candidate and CandidateSelection, but I think it makes sense to omit it for now. At the moment, we don't have anyone trying to actually model ballots. If and when those users come, I'd like to revisit the Selection classes

@gordonje I am a hard time understanding your proposal that a candidacy be relation between a person and an office-term.

If we go this route, is there even a Contest object? If so, what how are people related to contests?

How would I find all the people running in a particular contest? In pseudo sql would it be something like

 select person
 from candidacy
 inner join contest
    using (post_id)
 inner join election
    using (election_id)
 where candidacy.term_start_date > election.date

This seems like too loose a connection.

Additionally, in campaign finance, there are often important differences in the rules in a primary elections versus a general elections. If we keep the candidacies as defined by (person, contest) then we know that person is, for example, running for a general and not winding down their failed primary.

I agree that the president is a complicated case, but... it's a complicated case.

jdmgoogle commented 7 years ago

@fgregg Are you trying to link people to offices? Or contests to people? Because those are two separate things (i.e., everyone who's not an incumbent). The people in contests is easy. It's

CandidateContest ~> CandidateSelection ~> Candidate ~> Person

The Office term is

CandidateContest ~> Office ~> Term

The CandidateContest object has PrimaryPartyIds to let you know if it is associated with a given party, and if so which one(s).

In your pseudo sql example I don't quite understand why you're starting from Candidate since that's really the middle of the query chain and you need to go out in either direction. Also, where did candidacy.term_start_date come from? That's not in the spec anywhere and really conflates two separate things: a candidacy and an office term.

gordonje commented 7 years ago

@fgregg There's still a CandidateContest, which collects together candidates who are competing against each other in am election. In the Django implementation, I'm imagining a ManyToMany relationship between CandidateContest and Candidacy.

Furthering your example: Let's say you wanted to find every person who's declared candidacy for the 2014 California Governor's race.

select person
from candidacy
inner join officeterm
    (using office_term_id)
inner join post
    (using post_id)
where post.label = "Governor"
and date_part('year' from  officeterm.start_date) = 2015;

The above query would return you everyone that competed in the primaries as well. But if you wanted just the candidates who advanced to the general election, you would need the contest that happened on the general election date to help you filter.

select person
from candidacy
inner join officeterm
    (using office_term_id)
inner join post
    (using post_id)
inner join candidatecontestcandidacy
    (using candidacy_id)
inner join candidatecontest
    (using contest_id)
inner join election
    (using election_id)
where post.label = "Governor"
and date_part('date' from election.start_time) = "2014-11-04";

Additionally, in campaign finance, there are often important differences in the rules in a primary elections versus a general elections. If we keep the candidacies as defined by (person, contest) then we know that person is, for example, running for a general and not winding down their failed primary.

This does give me some pause. But our campaign finance spec does include an election property. Maybe this should be required instead of optional (@aepton maybe you have some thoughts).

@jdmgoogle Forest's comment were in reference to some changes I just pushed yesterday, which include editing out the Selection data types. Apologies for the churn.

Don't currently have a place for the endorsement parties you mentioned. If I were to add it, I think it would be on the CandidateContest-to-Candidacy relationship.

jdmgoogle commented 7 years ago

@gordonje Obviously this is your GitHub, so feel free to do whatever works for you, but the *Selection objects were the linkages between CandiateContest and Candidate objects, and included metadata about EndorsementPartyIds. I'm not sure why you'd remove that linkage and then try to add it back in another way, but I also don't have full visibility into the rest of the campaign finance schema.

gordonje commented 7 years ago

@jdmgoogle TBH, I'm as much an interloper on this repo as you 😁. It's a beautiful thing when people who care about the same problem can find each other on the internet and collaborate.

You can read the current draft of the campaign finance spec here.

I do find VIP's general three-level design of an election to be a quite intuitive.

The date when the voters are asked to decide a bunch of things (Election), which contains...
The specific things voters will decide (Contests), which each contain...
The options presented to voters (Selections).

My understanding is that third level for VIP includes every variation voters may encounter on their ballot, e.g., different spelling of candidate names, party endorsements, different orders for the contests and the options. That's obviously all very important when your use cases include preparing voters for what to expect when they go to the polls.

For us, though, that third level currently wants to be no more detailed than what are the potential outcomes for a given contest, which we need in order to adequetely model campaign finance activity (i.e., not just in which contests these entities are involved but what outcomes they support or oppose). And later we'll also care about the results of each contest (e.g., how many votes did each option get).

I think editing out *Selection does prevent confusion for users who might be familiar with VIP or traversing data between the two schemas. I've replaced these with properties on the Contest subclasses where the full-range of options, across all ballots that include a contest, can be listed (e.g., BallotMeasureContest.options, CandidateContest.candidacies.

@fgregg So just to further the case against my own proposal, if we cut out OfficeTerm the sql in my previous comment would have two fewer joins:

select person
from candidacy
inner join contest
    (using contest_id)
inner join post
    (using post_id)
inner join election
    (using election_id)
where post.label = "Governor"
and date_part('date' from election.start_time) = "2014-11-04";

So this would involve adding a contest_id and post_id to Candidacy, and we would be back to having one version of Donald Trump's candidacy for every CandidateContest, which feels like trouble when you're at the point of linking campaign finance committees to his candidacy. Even if you were going to say "there's really only one general presidential election and thus only one Donald Trump candidacy for that election", I don't see how you get around having multiple elections and contests, and thus multiple Donald Trump candidacies, in the presidential primaries, because these don't all happen on the same day.

Or maybe those duplicate candidacies desirable? Obviously, they can still be rolled up by the person_id...

gordonje commented 7 years ago

Following up on conversations @fgregg and I had at NICAR, we're seeing two big disadvantages to modeling office terms as part of candidacies:

The exact start and end dates of office terms aren't typically present in the source materials (e.g., election results APIs, csv files or state gov websites to be scraped) and there's likely to be a lot variation in the state and end date of office terms at the local level, even within the same date.
We would be adding another kink in moving between this spec and VIP, which doesn't allow candidate records to be reused between contests.

So with the latest changes, I have removed OfficeTerm and redefined Candidacy as a combo of CandidateContest/Post/Person.

In U.S. elections, then, most persons running for office will have at most two Candidacy objects: One for the primary and one for the general election. The big exception being the presidential race when the implementation includes multiple states.

This is once again ready for review.

gordonje commented 7 years ago

@jdmgoogle VIP allows a single CandidateContest to be linked to multiple Party elements, via the PrimaryPartyIds tag. Right now I have this as a one-to-one relationship, instead of a one-to-many. Can you provide examples of where a contest needs to be linked to having multiple parties instead of, for example, having a CandidateContest for each party's partisan primary?

gordonje commented 7 years ago

I've taken a closer look at python-opencivicdata-django, which is where we want to implement this specification. The Event Django model has a required "jurisdiction" field. However, this field is not mentioned in the OCD Event format or in the Event OCDEP.

Currently this proposal requires an Election.division_id field, but because it inherits from Event, an Election will also be linked to a Jurisdiction, and each Jurisdiction is also linked to a Division.

Does each Event require a Jurisdiction? If so, should we remove Election.division_id and rely on the Division linked to the Jurisdiction that's linked to the Event?

fgregg commented 7 years ago

I can't think of a really deep reason why events, generically, should have require either a jurisdiction or division field.

Practically, pupa people have needed a way of grouping different types of meetings associated with the same legislature together and this is how we've done it. I think that other ways are possible, though I don't have a concrete proposal right now.

I think it makes more sense for an Election to be associated with a division than a jurisdiction.

gordonje commented 7 years ago

@fgregg In that case I'll leave this proposal as it is, and raise the question in python-opencivicdata-django repo.

Any other feedback on this? I'm working now and adapting our current implementation to what is described here.

fgregg commented 7 years ago

No, I think this is ready to be accepted as draft proposal. @jpmckinney, thoughts?

jpmckinney commented 7 years ago

I haven't read the full thread - any salient points to keep in mind, or can I just read the patch?

fgregg commented 7 years ago

One thing that is good to know is that this proposal is purposefully partial. It includes the classes that seem necessary for modeling campaign finance, but not quite everything you would want for modeling what is on voting ballots or election results.

It seemed prudent to wait to make those models until we had an interested implementer.

On Tue, Mar 21, 2017 at 11:20 PM, James McKinney notifications@github.com wrote:

I haven't read the full thread - any salient points to keep in mind, or can I just read the patch?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencivicdata/docs.opencivicdata.org/pull/64#issuecomment-288294686, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbRf4woAG3-WG0CN6JL6n4-lO6cteks5roKF1gaJpZM4LjMG1 .

-- 773.888.2718

jpmckinney commented 7 years ago

For posterity, noting some election modeling elsewhere that I was just made aware of (I had asked folks at mySociety to review this thread):

UK Parliament: https://github.com/ukparliament/ontologies/blob/master/election/election.png
UK BBC: http://www.bbc.co.uk/ontologies/politics
UK Democracy Club: https://democracyclub.org.uk/blog/2017/01/20/making-every-election/ with code at https://github.com/DemocracyClub/EveryElection and data at https://elections.democracyclub.org.uk/

palewire commented 7 years ago

@jpmckinney I think you can largely just read the patch. You'll find a little less here than last time due to the change @fgregg described. Our Django implementation is also nearing completion in another repo.

gordonje commented 7 years ago

Just wanted to share that we now have a working implementation of this spec in our fork of python-opencivicdata-django. The attached zip contains csv files exported from our campaign finance app that loads these models.

We'll wait to make submit a PR on the django package until this spec is at least accepted, in case there are any other changes are necessary based on feedback.

fgregg commented 7 years ago

This is so cool @gordonje !

gordonje commented 7 years ago

Question came up in our implementation that I think deserves broader discussion.

Should a candidate contest include all the candidates who declared to run for the office in the election, or should we limit the list to just candidates who ultimately appeared on the ballot?

The campaign finance use cases surely want to allow for any candidate who raised/spent money in that contest irrespective of being included on the ballot. But I expect the election results use cases will want to limit the list to only candidates who could have received any votes.

So then Candidacy probably needs some property indicating if the person is currently an active candidate in the contest. This would likely be a boolean field like is_active or dropped_out. It would be most precise to store the date when a given candidate dropped out of each contest, but I doubt that is reliably available.

palewire commented 7 years ago

Could it be managed by the BallotSelection classes that were ultimately dropped by this proposal? If you have one of those, you're on the ballot. If you don't, you didn't.

aepton commented 7 years ago

I think we should definitely include candidates who dropped out, and is_active seems like the cleanest way to do that. If you care about when it happened, there should be a way to look up that filing (on phone, can't find the exact place in the spec right now but I'm sure it's in there) On Fri, Apr 14, 2017 at 9:56 AM James Gordon notifications@github.com wrote:

Question came up in our implementation that I think deserves broader discussion.

Should a candidate contest include all the candidates who declared to run for the office in the election, or should we limit the list to just candidates who ultimately appeared on the ballot?

The campaign finance use cases surely want to allow for any candidate who raised/spent money in that contest irrespective of being included on the ballot. But I expect the election results use cases will want to limit the list to only candidates who could have received any votes.

So then Candidacy probably needs some property indicating if the person is currently an active candidate in the contest. This would likely be a boolean field like is_active or dropped_out. It would be most precise to store the date when a given candidate dropped out of each contest, but I doubt that is reliably available.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencivicdata/docs.opencivicdata.org/pull/64#issuecomment-294190520, or mute the thread https://github.com/notifications/unsubscribe-auth/AActmzZLGER9UIsDgBnsdy0bO8xZlgWuks5rv6UugaJpZM4LjMG1 .

jdmgoogle commented 7 years ago

I haven't looked at your derivation of the schema, but did you remove the CandidatePreElectionStatus and CandidatePostElectionStatus enumerations from the Candidate element?

gordonje commented 7 years ago

@jdmgoogle Thanks for pointing out CandidatePreElectionStatus. Maybe an enum field with exactly these options will do the trick:

filed: The candidate has filed for office but not yet been qualified.
qualified: The candidate has qualified for the contest.
withdrawn: The candidate has withdrawn from the contest (but may still be on the ballot).

gordonje commented 7 years ago

VIP also has "write-in" among the CandidatePreElectionStatus options. That's another piece we're currently missing.

gordonje commented 7 years ago

Anyone have any problem calling this new field on Candidacy registration_status?

I'm also adding/populating it in our fork of opencivicdata-django.

What other steps are necessary for this spec to be accepted (even provisionally) so that we can pivot to discussing the draft implementation?

fgregg commented 7 years ago

@jpmckinney thoughts on this?

jpmckinney commented 7 years ago

I'll actually have time to review this coming week, but my basic feeling is that it's not really going to be something that's robust for international use, but that we're okay with that. So, it'll be a specification, but not one that targets standardization outside the US, except by accident.

palewire commented 7 years ago

That's great news, @jpmckinney! We're eager to complete our implementation and begin releasing a first generation of standardized files on our data portal. Please let us know what we can do to help you.

jpmckinney commented 7 years ago

Looks good! Thanks for your patience!

opencivicdata / docs.opencivicdata.org

Filling out Elections proposal #64

General notes

Small edits

Election

BallotMeasureContest

PartySelection

Big edits

Party

Contest

Should `Election` subclass `Event`?

Should `Contest` subclass `VoteEvent`?

Should `Party` subclass `Organization`?

Party

Election

Contest

Party

Election

Contest

opencivicdata / docs.opencivicdata.org

Filling out Elections proposal #64

General notes

Small edits

Election

BallotMeasureContest

PartySelection

Big edits

Party

Contest

Should Election subclass Event?

Should Contest subclass VoteEvent?

Should Party subclass Organization?

Party

Election

Contest

Party

Election

Contest

Should `Election` subclass `Event`?

Should `Contest` subclass `VoteEvent`?

Should `Party` subclass `Organization`?