openelections / openelections-core

Core repo for election results data acquisition, transformation and output.
MIT License
176 stars 96 forks source link

Normalized Candidates #30

Closed evz closed 10 years ago

evz commented 10 years ago

@fgregg and I have been exploring implementing the openelections data structure for our local elections in Chicago and we ran across an issue today which I'm wondering if you might consider implementing in a slightly different way.

Since a Candidate is stored as an EmbeddedDocument within each Result, (which is itself an EmbeddedDocument within a Contest) the process of updating an individual Candidate can be somewhat of a bear, especially for a candidate who has been running in elections for as long as we have data for (and since our data is at the precinct level)

The main reason this comes up is because we're storing information about local aldermen in a pupa instance which is giving us ocd_person ids for them. We'd like to be able to cross reference that info with the info about the elections that they've run in that we're storing in this app and the only way we have to do that is to manually add the ocd_person id into this app manually. The manual part of this we were expecting and can handle but I'm wondering if you might consider storing the candidates as a separate Document the way that you're storing the Office for a given result. This would certainly make the process of getting at the information about candidates a whole heck of a lot easier.

zstumgoren commented 10 years ago

Hey there, We're actually in the midst of updating the data model along the lines of what you're asking. I just pushed some in-flight code (possibly buggy, subject to additional change) to our tasks branch:

https://github.com/openelections/core/blob/tasks/openelex/models.py

The revised models create separate collections for Contest, Candidate and Result. The latter models (Candidate and Result) are now DynamicDocument subclasses; they contain both formal references to related documents and denormalized slugs/keys, along with a number of required fields. I've tried to explain the uses for non-obvious fields using help_text args, but let us know if you have questions about any of the fields.

This work is still in progress but I'm pushing so you can get a sense of where we're headed. Also, please note that these models are part of our backend data processing pipeline, and they may not necessarily be a good fit for your project or others looking for a data model tuned for a particular web application or some other "end-user" type system. For instance, you may find that you don't want or need some of the fields we require.

That said, we're open to suggestions that could help you while playing nice with a data pipeline intended for all 50 states.

evz commented 10 years ago

That looks great and nicely solves the issue of getting at the individual Candidates. At first glance it looks as if the changes you've implemented there suit our use case pretty well so, I'll go ahead and setup the models in our fork that way and then keep an eye on your master branch to see how that might develop.

Thanks again!