Closed gordonje closed 7 years ago
I think this is a great start. As we await feedback from the OCD crew, we're going to push ahead with a rough implementation in our django-calaccess-processed-data repository. That's where we are currently working to transform and simplify data from CAL-ACCESS, the state of California's jumbled, dirty and difficult campaign-finance database.
This looks really good to me.
Realized I had a couple of redundant post_id
fields, which I have just cleared up. I think we all agree that an OCD Post
is an analog to a VIP <Office>
, but I'm also thinking now that a RetentionContest
should reference a particular OCD Membership
, which I take to represent a particular person's tenure in a public office (@fgregg, @jpmckinney, and whoever, lemme know if I have that right).
Seems better if Candidacy contains one record for each time a person ran for a public office, not including the times they were in a recall election. That's I believe I have things modeled currently.
@gordonje Yes, you've got the correct interpretations of Popolo's (OCD's) Post
and Membership
:)
I'm assuming the PR isn't ready for review, but let me know if it is.
@jpmckinney It's ready to review. I've added some sample json pulled out of data CCDC has scraped from CAL-ACCESS. We've come pretty far along with a draft of django models implementing what's proposed here and, as a result, I've made a few more minor tweaks to this spec. Looking forward to some feedback.
OCD is written for international use, but VIP is written for the US only. It will require a fair bit more work to transform this proposal into something for international use.
If you just want to have US models, then I propose implementing this as a new Python package, that looks like and interoperates with python-opencivicdata-django, in which classes are prefixed with VIP
, to avoid clashing with any eventual internationally-usable elections models.
Prior to realizing this, I noted below several changes I would make to achieve a closer alignment between this proposal and existing relevant international standards. Classes I still have to review are CandidateContest
and Candidacy
.
Anyway, let me know if all you want is a straight Python/JSON version of VIP, in which case most of my comments may be irrelevant.
Separately, I'll also note here that OCD's Event
class probably has too many properties that not all subclasses should share (e.g. contests don't have agenda items).
ContestBase
to Contest
– same for BallotSelectionBase
. In Python it might become an abstract ContestBase
class, but a public schema wouldn't normally have a Base
suffix.administrative_org_id
: Avoid abbreviations. Schema.org has an organizer property with the same semantics.division_id
instead of state
? We may want to add division_id
to Event itself – or relate Events to full objects like Areas in Popolo, which can hold the division ID.is_statewide
would need to be renamed.pro
and con
instead of the more readable yes
and no
?ballot_measure_type
: We typically use the term classification
, and we never prefix property names with class names, because it's redundant. We don't have Event.event_name
for example.other_type
: I suggest removing it and just allowing ballot_measure_type
to be an open codelist. I don't see an advantage to pushing the non-standard codes into a new field.PartySelection
?extras
. If we later decide to promote one of those fields to the main class, it will be a nightmare to transition. Either don't mention the field at all, or put it on the main class.id
prefix)? They are events just like elections. In which case, Event should gain super_event
from Popolo (superEvent
in Schema.org), which would replace election_id
.passage_threshold
is requirement
and full_text
is text
in Motion, for example; summary_text
is summary
in Popolo's Event (and in iCal).effect_of_abstain
may need to be modeled out differently. Some countries have multiple types of abstentions with different effects. We can model this as a list of objects, one for each vote option, with another property to describe the effect. We can also store 'statements' here, though we will usually only have statements for yes
and no
.@jpmckinney Thank you for this thoughtful and thorough comment. We will review your requests and be back soon with an update.
@jpmckinney Thanks again for your thorough notes. You've picked up on several of the key challenges we ran into during the drafting process.
We want to continue pushing toward merging this spec into the core OCD API. To that end, we'll work to remove any U.S. specific features or cast them in more general terms. However, the CCDC and I believe several other collaborators here mostly have U.S. use cases in mind. So we might require on-going guidance and/or other collaborators to keep us check on that.
I've just committed changes that address the majority of your notes, plus a few other small related changes, specifically:
[x] Added a Definitions section for key terms and more detailed definitions for each data type.
[x] Merged the VIP differences discussions into the Implementation section, edited out unnecessary content and suggestions about throwing specific stuff into extras
.
[x] Removed Base
suffix from class names.
[x] Removed abbreviations from property names (e.g., administrative_organization_id
).
[x] On Election
:
Event
and added simple name
and date
properties (discussed below).state
and is_statewide
on Election
with division_id
.[x] Renamed pro_statement
and con_statement
on BallotMeasureContest
to support_statement
and oppose_statement
. I think this is an improvement even over yes
and no
since the text of the ballot measure selections might also be something like recall
and don't recall
.
[x] Replaced the enum field ballot_measure_type
with the simple string classification
. Removed other_type
.
[x] On BallotMeasureSelection
:
passage_threshold
to requirement
summary_text
to summary
full_text
to text
[x] Removed is_top_ticket
from Candidacy
.
Here are the points which require further discussion, which I have also added to the "Questions" section:
Election
subclass Event
?This was one of the original premises for this proposal, having been previously floated in this thread. What's intuitive about this is that we're used to thinking of elections in terms of when they will or did occur, and that mental picture looks like one or more specific dates on a calendar.
The further I've gone into this, however, the less elections seem to fit the mold of events as described in OCD, Popolo or Schema.org. These existing specifications strike me as being more focused on an event as a meeting or some other appointment on one's calendar, with a specific state and end time, location and attendee list.
To me, though, an election is more like an observed date, holiday or other calendar item one would set as "all day" because there is no specific start or end time ("all day" is one of the properties proposed for OCD Events, which is why we preferred to subclass that data type). Elections also don't happen in one specific location, but rather lots of places at once.
If we were inclined to model all the fine details of when and where people can vote in a specific election, perhaps modeling an election as a collection of related events would work. But I, for one, I'm daunted by the potential requirement of mapping out those relationships/hierarchies of events while juggling references to to contests and divisions and accommodating edge cases in places like Washington State with vote-by-mail only.
That level of detail about where and when someone can vote might be more important if our goal were merely to represent the next election (similar to VIP), as opposed to historical elections for which that data might be difficult to come by.
Rather, a general date field coupled with a division seem quite sufficient for our use cases while avoiding ambiguity about what belongs where. The use cases for campaign finance and (eventually) election results are more focused on the potential and actual outcomes of each election than specific times and places people might have voted.
If/when this specification does need to allow more finely detailed modeling, then perhaps the date field on election could swapped out in favor of a reference to an Event or Calendar object that could contain subevents for each place and time that voter might have voted, if needed.
Contest
subclass VoteEvent
?All the reasons Election
might not want to subclass Event
also apply to having the contest-related classes subclass VoteEvent
. It would also introduce additional irrelevant properties like legislative session and vote (because we can't know exactly when or how voters voted in any contest). It also would allow varying dates for each contest within an election, which shouldn't happen.
Party
subclass Organization
?This makes a lot of sense, but our concern (born mostly of ignorance) is how internal structures of political parties tend to be modeled using OCD Organization
. Forgive the topical humor, but I feel strongly that Republican legislators in Missouri should be lumped in with Donald Trump.
OCD allows precision in modeling, for example, the RNC organization distinct from the state parties and county-level organizations and associating persons as members to any of those organizations. But our concern is that allowing Candidacy
and CandidateContest
objects to reference any level of the party structure will frustrate the most common forms of analysis, which expect all the Republicans to be grouped with all the Republicans and all the Democrats to be grouped with all the Democrats, etc.
Would it make sense for OCD to have, as it does with divisions, a central repository of political parties that users are encouraged to use instead of rolling their own? That would certainly alleviate this sort of grouping problem that might arise, for example, in a cross-state analysis of election results. And if we had that centralized tightly controlled repo, we would be less concerned about modeling parties as orgs or some subclass of orgs.
Thanks!
I'm not familiar enough with party systems in the US to know whether a state Republican party is the same as federal, county, city, etc. - but let's say they are distinct (like they are in Canada, except for the NDP which is fully integrated). In that case, what you describe isn't tracking party membership as much as political alignment. We could model political alignment as a new property/class, which is the cleanest solution I can see. The alternative is to have a world that gets modeled differently depending on what sort of analysis you want to perform - which is strange. But, you could just make the decision in your data that there is only one Republican party and all Republicans are members of it - but your data won't be interoperable with another system whose model is closer to reality.
But if the Republican and Democratic parties are the same at all levels, then they should be modeled as being the same and not jurisdiction-specific.
All that said, having a Party class distinct from Organization won't save you - there's no central authority preventing people from creating lots of Democratic parties :)
I don't think making an Election an Event commits us to creating objects for every voting place. Independence Day is an event that appears in many people's calendars, yet it is all day (which is not the same start/end across time zones) and it happens nowhere or everywhere.
There are relatively few primitives in the world. I don't think elections are so fundamental that they ought to be a root class. In Schema.org (and Popolo), by virtue of reusing RDF vocabularies, every class falls into a hierarchy, the root of which is Thing. OCD doesn't define all those superclasses, because there are no use cases for them. But considering an Election is an Event (in the sense of Schema.org and the standards that Popolo reuses), we should put it into that hierarchy. Popolo is considering properties like start_event
and end_event
on Membership, which can point to an Election, Resignation, Nomination, etc. It wouldn't make sense for the range of those properties to be the union of various event-like classes; semantically, it makes more sense for them all to be descendants of Event.
Now, in OCD, we have an Event class that, as mentioned, is more like a Meeting because it has an agenda. Schema.org Event (which Popolo reuses) is more generic. So, in OCD, to preserve backwards-compatibility, we could have an abstract base class with descendants Event and Election.
My main concern is if we instead add more primitives, we risk having schemas drift instead of reusing the same properties. As the modeling progressively covers more objects in the real world, it will also become harder to learn the models if they aren't building on each other through class inheritance.
Yeah, on second thought, Contest is not even an Event. It's closer to a Motion in Popolo, but in any case it'd be a sibling of Motion, not a parent/child.
Great dialogue, @jpmckinney
I feel like the modeling of political alignment you mentioned adding is exactly what Candidacy.party_id
is meant to represent. That is, which party was the candidate affiliated with/endorsed by while campaigning for office, which may differ from the party with which that person caucuses as an office holder or which party they've joined as registered member (presumably these are cases for making use of Membership
).
So we'll end up having several different ways that a person can be related to an organization that is a party, and that the party might actually be multiple objects in the database with varying levels of specificity. Which is probably the right level of flexibility, and I shouldn't be too worried that smart people won't know what they're doing.
In case it's illustrative, I think our project would end up with, for example, many different versions/levels of Democratic Party (as national, state and county-level organizations) which would be distinct campaign finance organizations and an additional "Democratic" political party organization to which candidates would be connected. Not sure whether or not we would need to map out the relationship/hierarchies between these organizations.
All of that to say, I think we'll make Party
a subclass of Organization
with the following additional properties:
abbreviation
color
is_write_in
If adding an abstract base Event
class to OCD is on the table, then maybe we can make this work. Here are the properties I think each class should have:
EventBase
required
id
name
classification
start_time
all_day
timezone
created_at
updated_at
optional
end_time
sources
extras
Event
optional
description
jurisdiction
location
participants
documents
media
links
Election
required
division_id
(or should jurisdiction be on the base class?)optional
administrative_organization_id
(or should we add organizer_id
to the base class and allow person or org IDs?)identifiers
The only overlap I'm seeing between Motion
and what's currently proposed for Contest
is they need a requirement
property to describe the threshold of votes needed. But I also see potential overlap in terms respresenting the results and linking to counts of votes as we get into handling election results as well.
I also note that OCD's Votes proposal hasn't adopted Motion
yet and has a Bill data type that isn't part of Popolo.
Maybe we could follow the same tack as described above for Election
and add a new base abstract class from which both election contests and legislative vote occurences (aka, VoteEvent
instances) should inherit.
VoteEvent
is described in Popolo as being a subclass of Event
, but doesn't appear to be implemented that way in OCD. Seems the closest analog to VoteEvent
I can find in schema.org is VoteAction which is part of the Action class hierarchy, rather than the Event class hierarchy.
Here's a stab in the dark:
VoteContest
as a base abstract class with properties like
name
event_id
instead of start_date
and end_date
, and this could reference an Election
, legislative committee meeting or whatevercounts
result
(or possibly each subclass will have their own variant of this)Then MotionContest
and the other proposed subclasses for BallotMeasureContest
, CandidateContest
, etc. would all inherit from this.
Feel like I'm coming in here tossing around drastic changes to things. Didn't originally intend that. I'm sure there's plenty of important background I'm lacking. For example, maybe you could say more about the rationale, in Popolo, for VoteEvent
to subclass Event
.
Also with respect to contest: Maybe we should also stick with the term "option" which Popolo and OCD already use instead of adopting VIP's term "selection". I think these are one-and-the-same, especially if at some level we're trying to conflate election contests and legislative votes.
I've reverted Election to be a subclass of OCD Event
. I think the only thing that's not ideal about this approach is that Election
inherits the following properties that aren't likely to be used:
description
jurisdiction
location
participants
documents
media
links
status
However, this seems like a problem to be handled in another OCDEP focused more narrowly on the Event data type and customizations for different subclasses. What I think we all agree on right now is that Election should inherit from whatever is the current base Event
class. Let me know if that needs to be spelled out more explicitly somewhere in this proposal.
That leaves one last major sticking point: Whether what's currently named Contest
and its subclasses can be based on something that already exists in Popolo or OCD. I suspect the overlap between Contest and VoteEvent (or something like it) will become more clear as we start to deal with the results of the election contests (i.e., the counts for each selection/option and the outcome). That's something we're committed to continue speccing out, but we previously decided to hold it over for a future supplemental proposal.
This is tremendous work @gordonje!
Things are looking very good to me.
The only thing that's a little puzzling to me are the BallotSelection
objects. It seems that we will need this when we want to model concrete ballots that voters see. We do not need them for campaign finance models.
Let's wait until a user emerges who wants to model actual ballots.
Hi, I'm Justin Moore, the engineer from Google who oversaw the creation of the VIP 5.x release and worked with NIST on the 1500-100 election spec on which some of the VIP 5.x elements are based. I just became aware of this thread and was hoping to add a bit of background information and context to some of the the discussions.
OCD is written for international use, but VIP is written for the US only.
It's true that VIP is only used within the US, but the elements describing elections, contests, candidates, ballot measures, etc, come from the NIST spec, which should be compatible with international elections. If there are certain situations where the schema does not work for a particular election, please let us know.
Why pro and con instead of the more readable yes and no?
Many jurisdictions don't use "yes" or "no" and have more generic or alternative ways of indicating support or opposition. These include yes/no, support/oppose, pro/con, approve/reject, and a few others. We settled on "pro" and "con" since sometimes voting "yes" on something is actually voting against a certain proposal (Florida, California, and a few others are fun that way).
other_type: I suggest removing it and just allowing ballot_measure_type to be an open codelist. I don't see an advantage to pushing the non-standard codes into a new field.
Replaced the enum field ballot_measure_type with the simple string classification. Removed other_type
Our experience indicates that a free-form string renders the field effectively unusable, in that even common-case scenarios become custom. What ends up happening is that the major feed producers end up striking undocumented agreements with the major feed consumers as to magic values that end up in those fields. E.g., "by 'house' I mean 'the lower house'" and this ends up leaving states with "assemblies" to gladly do their own thing but all of a sudden it's hit-and-miss. We settled on the Type/OtherType semantics as a way to encourage feed producers and consumers to adhere to some best practices while forcing them to explicitly acknowledge when they're going "off-book" to create something custom to them.
full_text is text in Motion, for example; summary_text is summary in Popolo's Event (and in iCal)
Just as a heads-up this is often a nightmare to try and standardize, even within the United States. We've seen some summary_text fields that are two paragraphs long (because when the actual text is three pages, two paragraphs are a summary). I don't have any good answers here, I just wanted to flag this. If you end up with any good rules of thumb and guidelines to encourage best practices, please let us know. :)
Some countries have multiple types of abstentions with different effects.
Interesting. Could you provide an example?
Per @fgregg's feedback and ongoing discussions in PR #79, I'm editing out the BallotSelection
class hierarchy and other properties related to representing exact details of varying election ballots.
I'm also attempting to address something else that's been nagging at me, having to do with how we define a distinct candidacy.
Up to this point, I had been sticking to VIP's rule about candidate objects not being shared between contests. Certainly we want separate records for a person who runs for two different offices in the same election. And we want separate records for a person's initial and subsequent re-election attempts to the same office.
I think its more ideal, though, to allow a single candidacy to be related to multiple candidate contests to model:
In the case of the latter, having a single object for Donald Trump's presidential candidacy is superior to having 50 copies of Donald Trump 😬, especially when you are trying to connect his candidacy to the campaign finance committees actively engaged in the presidential election.
So to define a distinct candidacy, I propose something like: person -- office term
, where office term is a post with an expected start date and end date. Kinda similar to VIP's Office Term, except OCD's offices (aka, posts) are independent of any election and have mulitple terms. @jdmgoogle can maybe check me on this (also...hi! and welcome 🎉 )
Just to be clear on a couple of points:
Membership
for that.One of the motivating factors behind the separation of Person
, Candidate
, and CandidateSelection
is the distinction between the (a) the party with which the person identifies, (b) the party that is supporting their candidacy, and (c) the party that shows up next to their name on the ballot. For example, if Bernie Sanders had won the Democratic nomination, he would be (a) an Independent per self-identification, (b) a Democrat for the purposes of the candidacy, and (c) likely shown on the ballot in some states next to other parties (e.g., here's the
Otsego County, NY sample ballot indicating he probably would have shown up endorsed by the Working Families and Women's Equality parties).
If that's not an issue for you to track (the person versus the candidacy versus ballot endorsements) then you may be able to collapse per-state candidacies into a national candidacy. But if you need to track per-party spending limits or associate contributions on a per-state-campaign basis, this is one scenario of which you should be aware.
@jdmgoogle I think I understand the important distinction between Candidate
and CandidateSelection
, but I think it makes sense to omit it for now. At the moment, we don't have anyone trying to actually model ballots. If and when those users come, I'd like to revisit the Selection
classes
@gordonje I am a hard time understanding your proposal that a candidacy be relation between a person and an office-term.
If we go this route, is there even a Contest
object? If so, what how are people related to contests?
How would I find all the people running in a particular contest? In pseudo sql would it be something like
select person
from candidacy
inner join contest
using (post_id)
inner join election
using (election_id)
where candidacy.term_start_date > election.date
This seems like too loose a connection.
Additionally, in campaign finance, there are often important differences in the rules in a primary elections versus a general elections. If we keep the candidacies as defined by (person, contest) then we know that person is, for example, running for a general and not winding down their failed primary.
I agree that the president is a complicated case, but... it's a complicated case.
@fgregg Are you trying to link people to offices? Or contests to people? Because those are two separate things (i.e., everyone who's not an incumbent). The people in contests is easy. It's
CandidateContest ~> CandidateSelection ~> Candidate ~> Person
The Office term is
CandidateContest ~> Office ~> Term
The CandidateContest object has PrimaryPartyIds to let you know if it is associated with a given party, and if so which one(s).
In your pseudo sql example I don't quite understand why you're starting from Candidate
since that's really the middle of the query chain and you need to go out in either direction. Also, where did candidacy.term_start_date
come from? That's not in the spec anywhere and really conflates two separate things: a candidacy and an office term.
@fgregg There's still a CandidateContest
, which collects together candidates who are competing against each other in am election. In the Django implementation, I'm imagining a ManyToMany relationship between CandidateContest and Candidacy.
Furthering your example: Let's say you wanted to find every person who's declared candidacy for the 2014 California Governor's race.
select person
from candidacy
inner join officeterm
(using office_term_id)
inner join post
(using post_id)
where post.label = "Governor"
and date_part('year' from officeterm.start_date) = 2015;
The above query would return you everyone that competed in the primaries as well. But if you wanted just the candidates who advanced to the general election, you would need the contest that happened on the general election date to help you filter.
select person
from candidacy
inner join officeterm
(using office_term_id)
inner join post
(using post_id)
inner join candidatecontestcandidacy
(using candidacy_id)
inner join candidatecontest
(using contest_id)
inner join election
(using election_id)
where post.label = "Governor"
and date_part('date' from election.start_time) = "2014-11-04";
Additionally, in campaign finance, there are often important differences in the rules in a primary elections versus a general elections. If we keep the candidacies as defined by (person, contest) then we know that person is, for example, running for a general and not winding down their failed primary.
This does give me some pause. But our campaign finance spec does include an election property. Maybe this should be required instead of optional (@aepton maybe you have some thoughts).
@jdmgoogle Forest's comment were in reference to some changes I just pushed yesterday, which include editing out the Selection data types. Apologies for the churn.
Don't currently have a place for the endorsement parties you mentioned. If I were to add it, I think it would be on the CandidateContest-to-Candidacy relationship.
@gordonje Obviously this is your GitHub, so feel free to do whatever works for you, but the *Selection
objects were the linkages between CandiateContest
and Candidate
objects, and included metadata about EndorsementPartyIds
. I'm not sure why you'd remove that linkage and then try to add it back in another way, but I also don't have full visibility into the rest of the campaign finance schema.
@jdmgoogle TBH, I'm as much an interloper on this repo as you 😁. It's a beautiful thing when people who care about the same problem can find each other on the internet and collaborate.
You can read the current draft of the campaign finance spec here.
I do find VIP's general three-level design of an election to be a quite intuitive.
My understanding is that third level for VIP includes every variation voters may encounter on their ballot, e.g., different spelling of candidate names, party endorsements, different orders for the contests and the options. That's obviously all very important when your use cases include preparing voters for what to expect when they go to the polls.
For us, though, that third level currently wants to be no more detailed than what are the potential outcomes for a given contest, which we need in order to adequetely model campaign finance activity (i.e., not just in which contests these entities are involved but what outcomes they support or oppose). And later we'll also care about the results of each contest (e.g., how many votes did each option get).
I think editing out *Selection
does prevent confusion for users who might be familiar with VIP or traversing data between the two schemas. I've replaced these with properties on the Contest
subclasses where the full-range of options, across all ballots that include a contest, can be listed (e.g., BallotMeasureContest.options
, CandidateContest.candidacies
.
@fgregg So just to further the case against my own proposal, if we cut out OfficeTerm
the sql in my previous comment would have two fewer joins:
select person
from candidacy
inner join contest
(using contest_id)
inner join post
(using post_id)
inner join election
(using election_id)
where post.label = "Governor"
and date_part('date' from election.start_time) = "2014-11-04";
So this would involve adding a contest_id
and post_id
to Candidacy
, and we would be back to having one version of Donald Trump's candidacy for every CandidateContest
, which feels like trouble when you're at the point of linking campaign finance committees to his candidacy. Even if you were going to say "there's really only one general presidential election and thus only one Donald Trump candidacy for that election", I don't see how you get around having multiple elections and contests, and thus multiple Donald Trump candidacies, in the presidential primaries, because these don't all happen on the same day.
Or maybe those duplicate candidacies desirable? Obviously, they can still be rolled up by the person_id
...
Following up on conversations @fgregg and I had at NICAR, we're seeing two big disadvantages to modeling office terms as part of candidacies:
So with the latest changes, I have removed OfficeTerm
and redefined Candidacy
as a combo of CandidateContest
/Post
/Person
.
In U.S. elections, then, most persons running for office will have at most two Candidacy
objects: One for the primary and one for the general election. The big exception being the presidential race when the implementation includes multiple states.
This is once again ready for review.
@jdmgoogle VIP allows a single CandidateContest
to be linked to multiple Party
elements, via the PrimaryPartyIds
tag. Right now I have this as a one-to-one relationship, instead of a one-to-many. Can you provide examples of where a contest needs to be linked to having multiple parties instead of, for example, having a CandidateContest
for each party's partisan primary?
I've taken a closer look at python-opencivicdata-django, which is where we want to implement this specification. The Event
Django model has a required "jurisdiction" field. However, this field is not mentioned in the OCD Event format or in the Event OCDEP.
Currently this proposal requires an Election.division_id
field, but because it inherits from Event
, an Election
will also be linked to a Jurisdiction
, and each Jurisdiction
is also linked to a Division
.
Does each Event
require a Jurisdiction
? If so, should we remove Election.division_id
and rely on the Division
linked to the Jurisdiction
that's linked to the Event
?
I can't think of a really deep reason why events, generically, should have require either a jurisdiction
or division
field.
Practically, pupa people have needed a way of grouping different types of meetings associated with the same legislature together and this is how we've done it. I think that other ways are possible, though I don't have a concrete proposal right now.
I think it makes more sense for an Election to be associated with a division than a jurisdiction.
@fgregg In that case I'll leave this proposal as it is, and raise the question in python-opencivicdata-django repo.
Any other feedback on this? I'm working now and adapting our current implementation to what is described here.
No, I think this is ready to be accepted as draft proposal. @jpmckinney, thoughts?
I haven't read the full thread - any salient points to keep in mind, or can I just read the patch?
One thing that is good to know is that this proposal is purposefully partial. It includes the classes that seem necessary for modeling campaign finance, but not quite everything you would want for modeling what is on voting ballots or election results.
It seemed prudent to wait to make those models until we had an interested implementer.
On Tue, Mar 21, 2017 at 11:20 PM, James McKinney notifications@github.com wrote:
I haven't read the full thread - any salient points to keep in mind, or can I just read the patch?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencivicdata/docs.opencivicdata.org/pull/64#issuecomment-288294686, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbRf4woAG3-WG0CN6JL6n4-lO6cteks5roKF1gaJpZM4LjMG1 .
-- 773.888.2718
For posterity, noting some election modeling elsewhere that I was just made aware of (I had asked folks at mySociety to review this thread):
@jpmckinney I think you can largely just read the patch. You'll find a little less here than last time due to the change @fgregg described. Our Django implementation is also nearing completion in another repo.
Just wanted to share that we now have a working implementation of this spec in our fork of python-opencivicdata-django. The attached zip contains csv files exported from our campaign finance app that loads these models.
We'll wait to make submit a PR on the django package until this spec is at least accepted, in case there are any other changes are necessary based on feedback.
This is so cool @gordonje !
Question came up in our implementation that I think deserves broader discussion.
Should a candidate contest include all the candidates who declared to run for the office in the election, or should we limit the list to just candidates who ultimately appeared on the ballot?
The campaign finance use cases surely want to allow for any candidate who raised/spent money in that contest irrespective of being included on the ballot. But I expect the election results use cases will want to limit the list to only candidates who could have received any votes.
So then Candidacy
probably needs some property indicating if the person is currently an active candidate in the contest. This would likely be a boolean field like is_active
or dropped_out
. It would be most precise to store the date when a given candidate dropped out of each contest, but I doubt that is reliably available.
Could it be managed by the BallotSelection classes that were ultimately dropped by this proposal? If you have one of those, you're on the ballot. If you don't, you didn't.
I think we should definitely include candidates who dropped out, and is_active seems like the cleanest way to do that. If you care about when it happened, there should be a way to look up that filing (on phone, can't find the exact place in the spec right now but I'm sure it's in there) On Fri, Apr 14, 2017 at 9:56 AM James Gordon notifications@github.com wrote:
Question came up in our implementation that I think deserves broader discussion.
Should a candidate contest include all the candidates who declared to run for the office in the election, or should we limit the list to just candidates who ultimately appeared on the ballot?
The campaign finance use cases surely want to allow for any candidate who raised/spent money in that contest irrespective of being included on the ballot. But I expect the election results use cases will want to limit the list to only candidates who could have received any votes.
So then Candidacy probably needs some property indicating if the person is currently an active candidate in the contest. This would likely be a boolean field like is_active or dropped_out. It would be most precise to store the date when a given candidate dropped out of each contest, but I doubt that is reliably available.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencivicdata/docs.opencivicdata.org/pull/64#issuecomment-294190520, or mute the thread https://github.com/notifications/unsubscribe-auth/AActmzZLGER9UIsDgBnsdy0bO8xZlgWuks5rv6UugaJpZM4LjMG1 .
I haven't looked at your derivation of the schema, but did you remove the CandidatePreElectionStatus
and CandidatePostElectionStatus
enumerations from the Candidate
element?
@jdmgoogle Thanks for pointing out CandidatePreElectionStatus
. Maybe an enum field with exactly these options will do the trick:
VIP also has "write-in" among the CandidatePreElectionStatus
options. That's another piece we're currently missing.
Anyone have any problem calling this new field on Candidacy registration_status
?
I'm also adding/populating it in our fork of opencivicdata-django.
What other steps are necessary for this spec to be accepted (even provisionally) so that we can pivot to discussing the draft implementation?
@jpmckinney thoughts on this?
I'll actually have time to review this coming week, but my basic feeling is that it's not really going to be something that's robust for international use, but that we're okay with that. So, it'll be a specification, but not one that targets standardization outside the US, except by accident.
That's great news, @jpmckinney! We're eager to complete our implementation and begin releasing a first generation of standardized files on our data portal. Please let us know what we can do to help you.
Looks good! Thanks for your patience!
Third time's the charm!
My first PR was on datamade's fork of this repo. This second one was on this repo, but from datamade's fork which included some commits to files I never intended to change (probably as a result of my shoddy attempt to squash the commits).
We've essentially borrowed everything from the VIP XML spec that feels useful for the campaign finance use cases. Differences between proposed OCD data types and VIP elements are noted below.
This proposal doesn't include modeling election results (which also aren't in VIP). Will cover that in a future proposal.
@aepton: Saw your comments on the original PR. I've made a few changes you suggest and will run through each of your comments. After that, if you feel like I haven't adequetely address your questions/concerns, do you mind moving those comments over to this PR? Sorry for the redundant work!
@fgregg, @palewire and @dwillis and whoever else: Please give it look!